Device server debugging

Hi all,

I'm posting the question here since it is about a PyTango device server, but I think the issue is related to the C++ layer.

Recently some of our device servers have randomly started to use an unusual amount of CPU (15 to 30 % for a few, very simple devices). After investigation, it turns out that this is caused by a single thread. This thread is always the 7th thread created in the server. For instance:

TIDs: 101 (main thread), 105, 106, 107, 108, 109, 110, 111, 112, …

The tango documentation lists 8 specific threads to run the device server. If they're listed in the order of creation (this is not explicitly specified), it'd mean:

  • 101: main thread waiting in the ORB main loop
  • 105: ORB implementation thread (POA thread)
  • 106: ORB implementation thread (POA thread)
  • 107: ORB scavanger thread
  • 108: signal thread
  • 109: heartbeat thread (needed by the Tango event system)
  • 110: Zmq implementation thread
  • 111: Zmq implementation thread
  • 112: polling thread
So it seems like the culprit could be one of the two zmq threads but the device doesn't push events, except for the state that is polled. However, it does subscribe to events from other devices through a DeviceProxy.

My questions are:

  • Can I rely on the documentation for the thread creation order?
  • Is the 7th thread actually a zmq thread? Heartbeat or data thread?
  • Can a client event subscription mess up with the server event threads?
Thanks,

Vincent
Edited 8 years ago
Hi Vincent

Which release of ZMQ are you using?

Manu
Hi Manu,

That's the package installed in production (CentOS 6 64bits):
zeromq3.x86_64      3.2.5-1.el6
Edited 8 years ago
Vincent,

Please update to at least ZMQ 4.0.5. I know this is not the last one (4.1.3) but this is the one we used here at ESRF and it
works fine. We haven't yet tried Tango with newer ZMQ releases.

About your pb and your question:
The doc simply list the DS process threads. There is no guarantee that the order in which they are listed in the doc is the
order in which they are created in the DS process.
Anyway, in the past we also found some DS process eating a core as fast as possible. We arrived to the same conclusions than you.
The culprit was one of the ZMQ thread. I sent a question to ZMQ mailing list but did not get a real answer.
Anyway, since we have upgraded our system to 4.0.5, the problem has disappeared.

Hoping this help

Manu
All right, good to see we're not the only one having this issue!

Which version of Tango are you running at ESRF?
Is Tango 8.1.2 compatible with zeromq 4.0.5 or do we have to upgrade it as well?

Thanks for your quick answers,

Vincent
Well, at ESRF we are using Tango 9 because we play the rule of guinea pig before the release is made official.

AFAIK, IF YOU HAVE INSTALLED patch for Tango bug 662 (available on the web site) you should not have any compatibility
issue. By the way, installing all available patches is also a good idea.
If you are using the latest Tango debian packages, all patches are there.

Cheers

Manu
quote="Manu"]
If you are using the latest Tango debian packages, all patches are there.
Manu

Is there a package archive published by the tango project? I'm currently using the tango packaged with Ubuntu 14.04. Are these also up to date, or what is the reocommended way of ensuring that you have the most up to date debs installed?

Thanks
Neilen
 
Register or login to create to post a reply.