Event subscription in a PyTango device

# 9 years ago
Vincent M	Hi all, Here at MAX-IV, we have a lot of higher level devices that subscribe to change events from lower level devices (typically, valve devices subscribing to a PLC device). One issue I have with the event subscription is that the event callback thread and a client request thread might run concurrently, since there is no implicit locking. Therefore I'm using explicit locking but it is not a perfect solution; it is actually quite hard to maintain. For instance, the device can deadlock if those two threads end up waiting for the explicit lock and the monitor lock (the event callback thread might try to acquire the monitor lock if it pushes an event for instance). A solution could be to have the monitor lock (called AutoTangoMonitor in C++) accessible via a device method: `def event_callback(self, event): with self.get_monitor_lock() as lock: self.process_event(event)` Event better, the device itself could provide a method for event subscription: `def init_device(self): attr_proxy = AttributeProxy('a/b/c/d') args = attr_proxy, EventType.CHANGE_EVENT, self.event_callback self.subscribe_event(*args) # lock the callback automatically` The implementation of the method would look like: `def subscribe_event(self, attr_proxy, event_type, callback): def safe_callback(event): with self.get_monitor_lock(): return callback(event) return attr_proxy.subscribe_event(event_type, safe_callback)` This is the best solution I could find, and doesn't look too hard to implement. Please let me know if there is a better way to deal with that kind of issue. Otherwise, I'll fill up a feature request. Thanks, Vincent Edited 9 years ago

# 9 years ago

Hi all,

Here at MAX-IV, we have a lot of higher level devices that subscribe to change events from lower level devices (typically, valve devices subscribing to a PLC device).

One issue I have with the event subscription is that the event callback thread and a client request thread might run concurrently, since there is no implicit locking.

Therefore I'm using explicit locking but it is not a perfect solution; it is actually quite hard to maintain. For instance, the device can deadlock if those two threads end up waiting for the explicit lock and the monitor lock (the event callback thread might try to acquire the monitor lock if it pushes an event for instance).

A solution could be to have the monitor lock (called AutoTangoMonitor in C++) accessible via a device method:

    def event_callback(self, event):
        with self.get_monitor_lock() as lock:
            self.process_event(event)

Event better, the device itself could provide a method for event subscription:

    def init_device(self):
        attr_proxy = AttributeProxy('a/b/c/d')
        args = attr_proxy, EventType.CHANGE_EVENT, self.event_callback
        self.subscribe_event(*args) # lock the callback automatically

The implementation of the method would look like:

    def subscribe_event(self, attr_proxy, event_type, callback):
        def safe_callback(event):
            with self.get_monitor_lock():
                return callback(event)
        return attr_proxy.subscribe_event(event_type, safe_callback)

This is the best solution I could find, and doesn't look too hard to implement. Please let me know if there is a better way to deal with that kind of issue. Otherwise, I'll fill up a feature request.

Thanks,

Vincent

Edited 9 years ago

# 9 years ago
TCoutinho	Hi Vincent, I stumbled across a similar problem some time ago. I thought about exposing the tango monitor to python but I suspect that this would just create another deadlock between the tango monitor and the python GIL. Example: th1: client request to read attribute: th1: lock tango monitor th2: event callback th2. lock python GIL th2. lock tango monitor th1. lock python GIL — Deadlock — My workaround for the problem is to have a worker thread waiting for jobs. Anytime I have a tango blocking call I throw it into the worker thread. I call this a workaround on purpose because it is not actually a solution. I think the real problem is actually that tango is using the same lock to handle different things. Anyway, another thing you might try is to completely disable the tango serialization model: `util = PyTango.Util.instance() util.set_serial_model(PyTango.SerialModel.NO_SYNC)` This will disable the Tango Monitor completely. The default value is BY_DEVICE which normally prevents concurrent access to the device. If your device cannot handle concurrent read/command then it would be completely up to you to implement the serialization you need. Hope it helps Tiago Edited 9 years ago

# 9 years ago

TCoutinho

Hi Vincent,

I stumbled across a similar problem some time ago. I thought about exposing the tango monitor to python but I suspect that this would just create another deadlock between the tango monitor and the python GIL.

Example:

th1: client request to read attribute:
th1: lock tango monitor
th2: event callback
th2. lock python GIL
th2. lock tango monitor
th1. lock python GIL
— Deadlock —

My workaround for the problem is to have a worker thread waiting for jobs.
Anytime I have a tango blocking call I throw it into the worker thread.
I call this a workaround on purpose because it is not actually a solution.
I think the real problem is actually that tango is using the same lock to handle different things.

Anyway, another thing you might try is to completely disable the tango serialization model:


util = PyTango.Util.instance()
util.set_serial_model(PyTango.SerialModel.NO_SYNC)

This will disable the Tango Monitor completely.
The default value is BY_DEVICE which normally prevents concurrent access to the device.
If your device cannot handle concurrent read/command then it would be completely up to you
to implement the serialization you need.

Hope it helps

Tiago

Edited 9 years ago

# 9 years ago
Vincent M	Thanks Tiago for the quick answer! The GIL deadlock makes sense, I haven't thought about that. However, disabling the serialization or having a worker thread is not going to work for me. My plan is to have the devices working like a typical single-threaded asynchronous program. The fact that the server is actually multi-threaded is not a problem as long as each client request / polling callback / event callback run one after the other. This solution presents the following characteristics: - no lock, easier to maintain, less cognitive load - no performance drawback since the parallelization is limited by the GIL anyway - blocking IO calls are not an issue since I'm mostly relying on change event from lower devices. Another way to achieve the same idea would be to use a local client calling a dedicated command: `def event_callback(self, event): proxy = DeviceProxy(self.get_name()) proxy.process_event(self, event.attr_value.value) @command(dtype_in=int): def process_event(self, value): self.do_something(value)` But it feels like overdoing it (extra thread, extra socket) and it has limitations (no exception handling, single data type). I thought about two solutions to avoid the GIL/monitor deadlock. The first one is to expose the Tango monitor but to release the GIL when the program tries to acquire it. Just like AutoPythonGIL does (in pytgutils.h) but the other way around. They're is a paragraph in the boost.python HowTo but I'm definitely not a boost expert so I might miss something. The second one seems even simpler. An optional lock argument could be added to the subscribe method to pass the device reference to the c++ callback: `def init_device(self): attr_proxy = AttributeProxy('a/b/c/d') args = attr_proxy, EventType.CHANGE_EVENT, self.event_callback self.subscribe_event(args, lock=self)` The subscribe_event method save the device reference in the PyCallBackPushEvent object, just like it does with the callback. Then, when an event is received, the monitor lock is acquired before the GIL. In src/boost/cpp/callback.cpp: `template<typename OriginalT> static void _push_event(PyCallBackPushEvent self, OriginalT * ev) { […] // get the device reference somehow Tango::AutoTangoMonitor tango_guard(&dev); AutoPythonGIL gil; […]` Again I might miss something but let me know if you plan to experiment, I'll be happy to help! Thanks, Vincent Edited 9 years ago

# 9 years ago

Vincent M

Thanks Tiago for the quick answer!

The GIL deadlock makes sense, I haven't thought about that.
However, disabling the serialization or having a worker thread is not going to work for me. My plan is to have the devices working like a typical single-threaded asynchronous program. The fact that the server is actually multi-threaded is not a problem as long as each client request / polling callback / event callback run one after the other.

This solution presents the following characteristics:
- no lock, easier to maintain, less cognitive load
- no performance drawback since the parallelization is limited by the GIL anyway
- blocking IO calls are not an issue since I'm mostly relying on change event from lower devices.

Another way to achieve the same idea would be to use a local client calling a dedicated command:

    def event_callback(self, event):
        proxy = DeviceProxy(self.get_name())
        proxy.process_event(self, event.attr_value.value)

    @command(dtype_in=int):
    def process_event(self, value):
        self.do_something(value)

But it feels like overdoing it (extra thread, extra socket) and it has limitations (no exception handling, single data type).

I thought about two solutions to avoid the GIL/monitor deadlock.
The first one is to expose the Tango monitor but to release the GIL when the program tries to acquire it. Just like AutoPythonGIL does (in pytgutils.h) but the other way around. They're is a paragraph in the boost.python HowTo but I'm definitely not a boost expert so I might miss something.

The second one seems even simpler. An optional lock argument could be added to the subscribe method to pass the device reference to the c++ callback:

    def init_device(self):
        attr_proxy = AttributeProxy('a/b/c/d')
        args = attr_proxy, EventType.CHANGE_EVENT, self.event_callback
        self.subscribe_event(*args, lock=self)

The subscribe_event method save the device reference in the PyCallBackPushEvent object, just like it does with the callback. Then, when an event is received, the monitor lock is acquired before the GIL. In src/boost/cpp/callback.cpp:

    template<typename OriginalT>
    static void _push_event(PyCallBackPushEvent* self, OriginalT * ev)
    {
        […]
        // get the device reference somehow
        Tango::AutoTangoMonitor tango_guard(&dev);
        AutoPythonGIL gil;
        […]

Again I might miss something but let me know if you plan to experiment, I'll be happy to help!

Thanks,

Vincent

Edited 9 years ago

# 9 years ago
TCoutinho	Hi Vincent, Sorry for the late answer. Vincent M My plan is to have the devices working like a typical single-threaded asynchronous program. The fact that the server is actually multi-threaded is not a problem as long as each client request / polling callback / event callback run one after the other. I assume you are I/O bound (not CPU bound). If that is the case, you might consider gevent . I have been working on an experimental PyTango gevent friendly server. The results seem promising. The code is already available in the last version of PyTango. Here is a snippet. I have already used this in a server which is in production in some beamlines at the ESRF. Be aware that if your server communicates with other devices, it should use ``PyTango.gevent.DeviceProxy``. Anyway, I think your suggestion to export the TangoMonitor is feasible. Do you fell confident enough to make a pull request in github or do you prefer I do the implementation? As a principle I try to keep the API as close as I can to the TANGO C++ API so I would avoid changing the signature of subscribe_event if possible. FYI, `push_change_event(<attr name>, <value>)` is equivalent to: `with tangomonitor: <attr>.set_value(<value>) <attr>.fire_change_event()`

# 9 years ago

TCoutinho

Hi Vincent,

Sorry for the late answer.

Vincent M
My plan is to have the devices working like a typical single-threaded asynchronous program. The fact that the server is actually multi-threaded is not a problem as long as each client request / polling callback / event callback run one after the other.

I assume you are I/O bound (not CPU bound). If that is the case, you might consider gevent smile

. I have been working on an experimental PyTango gevent friendly server. The results seem promising. The code is already available in the last version of PyTango.
Here is a snippet.
I have already used this in a server which is in production in some beamlines at the ESRF.
Be aware that if your server communicates with other devices, it should use ``PyTango.gevent.DeviceProxy``.

Anyway, I think your suggestion to export the TangoMonitor is feasible. Do you fell confident enough to make a pull request in github or do you prefer I do the implementation?

As a principle I try to keep the API as close as I can to the TANGO C++ API so I would avoid changing the signature of subscribe_event if possible.

FYI,


push_change_event(<attr name>, <value>)

is equivalent to:


with tangomonitor:
   <attr>.set_value(<value>)
   <attr>.fire_change_event()

# 9 years ago
Vincent M	Hi Tiago, Yes, most of our devices are actually IO bound, that's why I'm very interested in taking profit of single-threaded asynchronous libraries like gevent. Your example looks very promising! Is it possible to disable the monitor lock completely in your example? A typical usecase for that is: Client 1 asks to read an attribute from the hardware through a socket (patched with gevent) Client 2 asks the state (does not depend on the hardware) Client 2 gets the state Client 1 gets the read value I find very frustrating to block-wait for an IO in a device that is supposed to work asynchronously. Also, I've been looking into asyncio and I find those explicit coroutines very interesting! Especially with the new await and async keywords now available in python 3.5. In a tango server, it could provide us with very elegant syntaxes like: `async def read_voltage(self): return await self.instrument.read_voltage_coroutine()` And there's a library called aiozmq to interface asyncio with zmq, so it could help us to move toward a pure python implementation, once the corba-to-zmq transition is over All right for the push request! I don't really feel comfortable with boost, but I'll see if there is someone in the team to help me with it. Since you don't want to change the DeviceProxy.subscribe_event prototype, we'll try to make the monitor lock available through the server API. Probably with a method called Device.get_monitor_lock() that returns a context manager. Oh, I haven't thought of replacing `push_event` with `fire_events` in order to avoid the deadlock. I'll give it a try as well! Thanks, Vincent Edited 9 years ago

# 9 years ago

Vincent M

Hi Tiago,

Yes, most of our devices are actually IO bound, that's why I'm very interested in taking profit of single-threaded asynchronous libraries like gevent. Your example looks very promising! smile

Is it possible to disable the monitor lock completely in your example?

A typical usecase for that is:

Client 1 asks to read an attribute from the hardware through a socket (patched with gevent)
Client 2 asks the state (does not depend on the hardware)
Client 2 gets the state
Client 1 gets the read value

I find very frustrating to block-wait for an IO in a device that is supposed to work asynchronously.

Also, I've been looking into asyncio and I find those explicit coroutines very interesting!
Especially with the new await and async keywords now available in python 3.5.
In a tango server, it could provide us with very elegant syntaxes like:

   async def read_voltage(self):
        return await self.instrument.read_voltage_coroutine()

And there's a library called aiozmq to interface asyncio with zmq, so it could help us to move toward a pure python implementation, once the corba-to-zmq transition is over smile

All right for the push request! I don't really feel comfortable with boost, but I'll see if there is someone in the team to help me with it. Since you don't want to change the DeviceProxy.subscribe_event prototype, we'll try to make the monitor lock available through the server API. Probably with a method called Device.get_monitor_lock() that returns a context manager.

Oh, I haven't thought of replacing `push_event` with `fire_events` in order to avoid the deadlock. I'll give it a try as well!

Thanks,

Vincent

Edited 9 years ago

# 9 years ago
TCoutinho	Vincent M Is it possible to disable the monitor lock completely in your example? When you run the server in gevent mode, the TANGO serial model is set to NO_SYNC virtually disabling the monitor. Yes, asyncio rocks! I agree the new keywords look very very interesting. I have to investigate a little better if/how to change the TANGO event loop to make it asyncio friendly. Honestly I haven't though about it too much because they are still changing a lot of things in the python API. That's why until now all my efforts have been into gevent. If there is any asyncio expert/fan listening: be glad to team up with you to make this work Didn't now about aiozmq. Might by be interesting to see how they did it to steal some ideas for Tango. A pure python implementation would help in making it coroutine friendly (Matias Guijarro has already tested a pure python implementation of TANGO using a pure python CORBA implementation). The problem is that implementing all the TANGO logic for both client and server would take a lot of effort. Vincent M All right for the push request! I don't really feel comfortable with boost, but I'll see if there is someone in the team to help me with it me neither . I can help if you need. Thanks for the insights Cheers Tiago

# 9 years ago

TCoutinho

Vincent M
Is it possible to disable the monitor lock completely in your example?

When you run the server in gevent mode, the TANGO serial model is set to NO_SYNC virtually disabling the monitor.

Yes, asyncio rocks! I agree the new keywords look very very interesting.
I have to investigate a little better if/how to change the TANGO event loop to make it asyncio friendly.
Honestly I haven't though about it too much because they are still changing a lot of things in the python API. That's why until now all my efforts have been into gevent.
If there is any asyncio expert/fan listening: be glad to team up with you to make this work smile

Didn't now about aiozmq. Might by be interesting to see how they did it to steal some ideas for Tango.

A pure python implementation would help in making it coroutine friendly (Matias Guijarro has already tested a pure python implementation of TANGO using a pure python CORBA implementation). The problem is that implementing all the TANGO logic for both client and server would take a lot of effort.

Vincent M
All right for the push request! I don't really feel comfortable with boost, but I'll see if there is someone in the team to help me with it

me neither smile

. I can help if you need.

Thanks for the insights

Cheers
Tiago

# 9 years ago
Vincent M	TCoutinho When you run the server in gevent mode, the TANGO serial model is set to NO_SYNC virtually disabling the monitor. Alright, that definitely makes sense! TCoutinho If there is any asyncio expert/fan listening: be glad to team up with you to make this work Well, I had a look and realized it was pretty easy to implement, since you did most of the work by using an executor for gevent. I commited an asyncio executor here but it is still untested. I'll play with it a bit before sending a pull request TCoutinho The problem is that implementing all the TANGO logic for both client and server would take a lot of effort. Yes it surely is a huge amount of work! Maybe some day… I'll have a discussion with the team to see if someone else is interested in having the monitor lock available in PyTango. I think people are also interested in exposing other methods like fill_attr_polling_buffer and I have a few pieces of code that could be useful to have in the library, so you might receive a few pull requests over the next weeks Cheers, Vincent Edited 9 years ago

# 9 years ago

Vincent M

TCoutinho
When you run the server in gevent mode, the TANGO serial model is set to NO_SYNC virtually disabling the monitor.

Alright, that definitely makes sense!

TCoutinho
If there is any asyncio expert/fan listening: be glad to team up with you to make this work

Well, I had a look and realized it was pretty easy to implement, since you did most of the work by using an executor for gevent. I commited an asyncio executor here but it is still untested. I'll play with it a bit before sending a pull request smile

TCoutinho
The problem is that implementing all the TANGO logic for both client and server would take a lot of effort.

Yes it surely is a huge amount of work! Maybe some day…

I'll have a discussion with the team to see if someone else is interested in having the monitor lock available in PyTango. I think people are also interested in exposing other methods like fill_attr_polling_buffer and I have a few pieces of code that could be useful to have in the library, so you might receive a few pull requests over the next weeks smile

Cheers,

Vincent

Edited 9 years ago