Multiprocessing over PyTango

Hi all,

here is the new trouble shoot we met while migrating our project 'ExTra' from Tango V8 to V9 (see previous post)

we got trouble with multiprocessing programming using multiprocessing python module. While it worked with previous version, now we ran into this error : "terminate called after throwing an instance of 'omni_thread_fatal'", just by executing this piece of code within a DS :

from multiprocessing import Pool
pool = Pool(processes=4)
terminate called after throwing an instance of 'omni_thread_fatal'


Basicaaly, the code won't crash, it is simply executed on a single processor (and moreover not fully loaded). I would add that the same code executed outside of a DS works correctly.

Do you have any tips ? Is it due to the way DS process are executed ? Sorry but I could not search into existing posts… The search service fails for some reasons…

In the worst case, what could be the other way to execute parallel threads within a DS ?

Many thanks
Stephane
Hi Stephane

Can you provide a complete example? I could not reproduce the problem using the latest PyTango (development branch) with Python 3.6.8, and cppTango 9.3.2. Also tried with PyTango 9.2.5 on Python 2.7, and cppTango 9.2.5.

The docs have some notes on multiprocessing on the client side. Maybe they apply to the server too? You could try that (although you need at least PyTango 9.3.0)
https://pytango.readthedocs.io/en/v9.3.0/howto.html#using-clients-with-multiprocessing
I added your lines to the ClockDS.py example from PyTango:


(buildenv) tango-cs@9f290e95fd04:~/build/tango-controls/pytango/examples/Clock$ git diff ClockDS.py
diff –git a/examples/Clock/ClockDS.py b/examples/Clock/ClockDS.py
index 4401299..462cddc 100644
— a/examples/Clock/ClockDS.py
+++ b/examples/Clock/ClockDS.py
@@ -19,6 +19,8 @@ from enum import IntEnum
from tango import AttrWriteType
from tango.server import Device, attribute, command

+from multiprocessing import Pool
+pool = Pool(processes=4)

class Noon(IntEnum):
AM = 0 # DevEnum's must start at 0


Results of a run, pushing Ctrl+c at the end to exit:


(buildenv) tango-cs@9f290e95fd04:~/build/tango-controls/pytango/examples/Clock$ python -m tango.test_context ClockDS.Clock –host $(hostname)
Can't create notifd event supplier. Notifd event not available
Ready to accept request
Clock started on port 8888 with properties {}
Device access: tango://9f290e95fd04:8888/test/nodb/clock#dbase=no
Server access: tango://9f290e95fd04:8888/dserver/Clock/clock#dbase=no
^CProcess ForkPoolWorker-1:
Process ForkPoolWorker-2:
Process ForkPoolWorker-3:
Traceback (most recent call last):
File "/home/tango-cs/build/miniconda/envs/buildenv/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/home/tango-cs/build/miniconda/envs/buildenv/lib/python3.6/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "/home/tango-cs/build/miniconda/envs/buildenv/lib/python3.6/multiprocessing/pool.py", line 108, in worker
task = get()
File "/home/tango-cs/build/miniconda/envs/buildenv/lib/python3.6/multiprocessing/queues.py", line 334, in get
with self._rlock:
File "/home/tango-cs/build/miniconda/envs/buildenv/lib/python3.6/multiprocessing/synchronize.py", line 95, in __enter__
return self._semlock.__enter__()
KeyboardInterrupt
Traceback (most recent call last):
Process ForkPoolWorker-4:
File "/home/tango-cs/build/miniconda/envs/buildenv/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/home/tango-cs/build/miniconda/envs/buildenv/lib/python3.6/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "/home/tango-cs/build/miniconda/envs/buildenv/lib/python3.6/multiprocessing/pool.py", line 108, in worker
task = get()
File "/home/tango-cs/build/miniconda/envs/buildenv/lib/python3.6/multiprocessing/queues.py", line 334, in get
with self._rlock:
File "/home/tango-cs/build/miniconda/envs/buildenv/lib/python3.6/multiprocessing/synchronize.py", line 95, in __enter__
return self._semlock.__enter__()
KeyboardInterrupt
Traceback (most recent call last):
File "/home/tango-cs/build/miniconda/envs/buildenv/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/home/tango-cs/build/miniconda/envs/buildenv/lib/python3.6/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "/home/tango-cs/build/miniconda/envs/buildenv/lib/python3.6/multiprocessing/pool.py", line 108, in worker
task = get()
File "/home/tango-cs/build/miniconda/envs/buildenv/lib/python3.6/multiprocessing/queues.py", line 335, in get
res = self._reader.recv_bytes()
File "/home/tango-cs/build/miniconda/envs/buildenv/lib/python3.6/multiprocessing/connection.py", line 216, in recv_bytes
buf = self._recv_bytes(maxlength)
File "/home/tango-cs/build/miniconda/envs/buildenv/lib/python3.6/multiprocessing/connection.py", line 407, in _recv_bytes
buf = self._recv(4)
File "/home/tango-cs/build/miniconda/envs/buildenv/lib/python3.6/multiprocessing/connection.py", line 379, in _recv
chunk = read(handle, remaining)
KeyboardInterrupt

Traceback (most recent call last):
File "/home/tango-cs/build/miniconda/envs/buildenv/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/home/tango-cs/build/miniconda/envs/buildenv/lib/python3.6/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "/home/tango-cs/build/miniconda/envs/buildenv/lib/python3.6/multiprocessing/pool.py", line 108, in worker
task = get()
File "/home/tango-cs/build/miniconda/envs/buildenv/lib/python3.6/multiprocessing/queues.py", line 334, in get
with self._rlock:
File "/home/tango-cs/build/miniconda/envs/buildenv/lib/python3.6/multiprocessing/synchronize.py", line 95, in __enter__
return self._semlock.__enter__()
KeyboardInterrupt
Edited 2 weeks ago
Hi Anton,

thank you for your reply. Forget the error 'omni_thread_fatal' for a while (it might be due to omniORB library versions… ) because it is not correlated with the main problem. Let's just focus on Multiprocessing piece of code that should work… We have investigated a bit more to understand what goes wrong. Acually, it deals with the Base class device inherited by the DS.

On the first hand, let's consider your example that we slightly modified to involve a real multiprocessing code (please see first file Device_Clock.py and command ctime). So DS is based on class 'Device', and actually works correctly. Here is the output when we launch the DS and call the command 'ctime' :


$ python3 Device_Clock.py clock
Ready to accept request
In parallel
0 start
1 start
2 start
3 start
1 end
2 end
0 end
3 end
compute time 3.822722911834717
[0, 1, 2, 3]


Now, on the second hand, we used Pogo to generate the same kind of DS. But this time, as Pogo made it, it is based on Device_4Impl (please see second file Device4Impl_Clock.py). Unfortunately this one won't work on Tango9. As you can see on output it stucked :

$ python3 Device4Impl_Clock.py clock
Ready to accept request
In parallel


We are pretty sure that multiprocessing is not started at all, since no load is noticed on processors.
Please also note that the same DS used to work on Tango8.
Our configuration is Ubuntu 18.04, Tango 9.2.5a, Python 3.6.8 and PyTango 9.3.0

What do you think of the difference between these 2 DS ?

Many thanks for your help.

Stephane
Dear Anton,

good news !! we finally also tried to update the omniORB4 lib from version 4.1.6 to 4.2.3; It solved both the error message and the multiprocessing feature !!

Now everything is fine.

Thanks anyway for your support.
Stephane
Hi Stephane

Glad to hear you have found a solution for your problem.

> What do you think of the difference between these 2 DS ?

In PyTango, if a device server inherits from `Device`, then it automatically gets the latest version of the implementation (that PyTango knows about). PyTango 9.3.0 would use `Device_5Impl` instead of `Device_4Impl`. I'm not sure why that made a difference to the multiprocessing.

POGO has two Python code generation options. The PythonHL option does use `Device`, and the high level API, which I find much easier to use.

Regards,
Anton
 
Register or login to create to post a reply.