Limitation on clients connections?

Hi all,

Has somebody experimented some limitations with a Python client that connects to a lot of devices?

At SOLEIL, we have some scripts that connect to lots of devices.
With the following example and around 15 000 devices, it systematically returns the same errors at the end of the script (around 800/1000 errors per execution). The error returned is:

DevFailed[
DevError[
desc = TRANSIENT CORBA system exception: TRANSIENT_ConnectFailed
origin = Connection::connect
reason = API_CorbaException
severity = ERR]

DevError[
desc = Failed to connect to device tdl/dg/calc-xbpm-ode
origin = Connection::connect
reason = API_CantConnectToDevice
severity = ERR]
]


The code to reproduce is:

dbds=Database()
res=dbds.get_device_exported('*')
i=0
j=0
for dev in res:
i=i+1
try:
DeviceProxy(dev).state()
except Exception, ex:
j=j+1
print ex
print "errors", j
print i


Is there an known limitation?
Is there a way to workaround theses errors?
Or is it a bug?

Best Regards,

Gwenaëlle.
Hi Gwenaëlle,

Do all the devices belong to the same server?

Could it be a OS limit on the number of open file descriptors (sockets)?
Usually on linux, for a "normal" user it is set to 1024.
You can check with
ulimit -a
and look for the open files line.
Hi Gwen,

Are you sure the devices for which you are getting some errors are really up and running?
A device can be flagged as exported in the Database but could actually be stopped.
It could still be flagged as exported in the Database because it was not stopped correctly (crash, kill -9)?

The remark from Tiago related to resources limits is interesting too.
I've seen you could get IMP_LIMIT CORBA system exceptions too with your example when executed on a big Tango database.

Here is an extract of the CORBA specifications for this IMP_LIMIT exception:

This exception indicates that an implementation limit was exceeded in the ORB run time. For example, an ORB may
reach the maximum number of references it can hold simultaneously in an address space, the size of a parameter may
have exceeded the allowed maximum, or an ORB may impose a maximum on the number of clients or servers that can
run simultaneously.

Cheers,
Reynald
Rosenberg's Law: Software is easy to make, except when you want it to do something new.
Corollary: The only software that's worth making is software that does something new.
Thanks for your help,

As suggested by Tiago, we have done a test increasing the number of file descriptors (that was 1024) and tracking the number of open connections while the script is running: it goes up to 4500. So increasing this parameter fixes our problem.

But, why is there so many connections opened at the same time while this script is only sequential? Is there a way to force connection closing?

FYI, we have done also a test of adding a "while true loop" at the end of the script, and it takes around 2 minutes for all connections to be closed.

Cheers,

Gwen.



Edited 3 weeks ago
Hi Gwen, it looks to be an interesting phenomenon. Maybe it is worth to add this kind of test to the benchmark tool?
(https://github.com/tango-controls/sys-tango-benchmark)

Up to now, it provides opposite tests which more or less check limits on concurrent client connections to the same device.

Best regards,
Piotr
 
Register or login to create to post a reply.