PyTango group write strange behaviour

# 5 years ago
paul.bell	Hello, I see some strange behaviour when writing an attribute on a tango group, unless I have misunderstood how it should work. The real situation is that I have a group of approx 200 power supplies, and I want to write to the Current attribute at 10Hz. The group reply is generally fast enough but sometimes it takes longer than 0.1s, and in this case I thought I could use the timeout in the write_attribute_reply to just ignore this, as I don't care about the reply. Over time however the memory increases towards 100pc. (https://pytango.readthedocs.io/en/v9.3.0/client_api/group.html#tango.Group.write_attribute_reply) I can illustrate with a dummy device that provides a Current attribute, that I have hard coded to take 0.1s to write. With the client below I make a group of 10, then try relatively quickly - the sleep in the loop is only 0.001s. This is more extreme than the real case where the loop time there is actually comparable to the write time. If I have no timeout the write time is always around 0.1s as expected. If I set the timeout to 0.001, the write time becomes something like 0.00035, which is strange as I would expect 0.001. Also the memory use shoots up. More oddly, if I change the timeout to 0.002 or above, the write time becomes around 0.12. So even slower than if I had no timeout, though without a memory leak. I can imagine if you keep the replies and write faster than you consume them, you will build up a buffer of replies in memory. But I would have thought that the point of the timeout is to throw these away. Many thanks for any advice (pytango 9.2.1 and 9.3.0) Paul Client code: import PyTango import time test_group = PyTango.Group("test") test_group.add("simulation/dummyps/1") #repeat for dummy power supplies 2 to 10 vals = [1,2,3,4,5,6,7,8,9,10] while True: time.sleep(0.001) now = time.time() id = test_group.write_attribute_asynch("Current",vals,multi=True) reply = test_group.write_attribute_reply(id, 10) # 10ms = 0.001s to wait print "wrote in ", time.time()-now

# 5 years ago

Hello,

I see some strange behaviour when writing an attribute on a tango group, unless I have misunderstood how it should work. The real situation is that I have a group of approx 200 power supplies, and I want to write to the Current attribute at 10Hz. The group reply is generally fast enough but sometimes it takes longer than 0.1s, and in this case I thought I could use the timeout in the write_attribute_reply to just ignore this, as I don't care about the reply. Over time however the memory increases towards 100pc. (https://pytango.readthedocs.io/en/v9.3.0/client_api/group.html#tango.Group.write_attribute_reply)

I can illustrate with a dummy device that provides a Current attribute, that I have hard coded to take 0.1s to write. With the client below I make a group of 10, then try relatively quickly - the sleep in the loop is only 0.001s. This is more extreme than the real case where the loop time there is actually comparable to the write time.

If I have no timeout the write time is always around 0.1s as expected. If I set the timeout to 0.001, the write time becomes something like 0.00035, which is strange as I would expect 0.001. Also the memory use shoots up. More oddly, if I change the timeout to 0.002 or above, the write time becomes around 0.12. So even slower than if I had no timeout, though without a memory leak.

I can imagine if you keep the replies and write faster than you consume them, you will build up a buffer of replies in memory. But I would have thought that the point of the timeout is to throw these away.

Many thanks for any advice (pytango 9.2.1 and 9.3.0)

Paul

Client code:

import PyTango
import time

test_group = PyTango.Group("test")
test_group.add("simulation/dummyps/1")
#repeat for dummy power supplies 2 to 10

vals = [1,2,3,4,5,6,7,8,9,10]

while True:
time.sleep(0.001)
now = time.time()
id = test_group.write_attribute_asynch("Current",vals,multi=True)
reply = test_group.write_attribute_reply(id, 10) # 10ms = 0.001s to wait
print "wrote in ", time.time()-now

# 5 years ago
Reynald	Hi Paul, There might be something missing in the Group API. When using DeviceProxy to execute some asynchronous calls, if you don't care about the answer, you should use tango.DeviceProxy.cancel_asynch_request to notify that you are not interested (or no longer interested) in knowing whether the call succeeded or not. This call will also free some resources and would solve your memory leak issue if you were using DeviceProxy objects directly. You can refer to cppTango Doxygen doc too to get more details about this method. I paste here what the doc says: virtual void Tango::Connection::cancel_asynch_request ( long id ) Cancel a pending asynchronous request. id is the asynchronous call identifier. This is a call local to the client. It simply allows the caller not to get the answer of the asynchronous request. It does not interrupt the call execution on the remote device. Maybe there is something wrong with the current implementation of Tango Groups because I don't see anything similar to DeviceProxy::cancel_asynch_request() invoked anywhere. It looks like, at first sight, that cancel_asynch_request() is never invoked in the current implementation of the Tango Groups. I think it should be invoked when write_attribute_reply is called with a timeout and failed because of this timeout because the reply was not yet received from at least one of the devices from the group so the asynchronous request object is not freed, or we should give the possibility to the user to try to read again the reply a bit later or to cancel it (<=> I don't care any more about this asynchronous request). The current implementation does not seem to allow that. One thing you could try is to use another timeout, the client timeout that you can specify when adding elements to the group (See PyTango doc). This is actually the CORBA client timeout. You can also use tango.Group.set_timeout_millis to set it. The timeout in write_attribute_reply() is different than this one because it corresponds to the maximum time to wait for the reply from the moment when write_attribute_reply() is called. This might solve your memory leak issue because in this case the asynchronous request object should be correctly freed I think. Hoping this helps, Reynald Rosenberg's Law: Software is easy to make, except when you want it to do something new. Corollary: The only software that's worth making is software that does something new. Edited 5 years ago

# 5 years ago

Reynald

Hi Paul,

There might be something missing in the Group API.

When using DeviceProxy to execute some asynchronous calls, if you don't care about the answer, you should use tango.DeviceProxy.cancel_asynch_request to notify that you are not interested (or no longer interested) in knowing whether the call succeeded or not.
This call will also free some resources and would solve your memory leak issue if you were using DeviceProxy objects directly.

You can refer to cppTango Doxygen doc too to get more details about this method.

I paste here what the doc says:

virtual void Tango::Connection::cancel_asynch_request ( long id )

Cancel a pending asynchronous request. id is the asynchronous call identifier. This is a call local to the client. It simply allows the caller not to get the answer of the asynchronous request. It does not interrupt the call execution on the remote device.

Maybe there is something wrong with the current implementation of Tango Groups because I don't see anything similar to DeviceProxy::cancel_asynch_request() invoked anywhere.

It looks like, at first sight, that cancel_asynch_request() is never invoked in the current implementation of the Tango Groups.
I think it should be invoked when write_attribute_reply is called with a timeout and failed because of this timeout because the reply was not yet received from at least one of the devices from the group so the asynchronous request object is not freed, or we should give the possibility to the user to try to read again the reply a bit later or to cancel it (<=> I don't care any more about this asynchronous request). The current implementation does not seem to allow that.

One thing you could try is to use another timeout, the client timeout that you can specify when adding elements to the group (See PyTango doc). This is actually the CORBA client timeout. You can also use tango.Group.set_timeout_millis to set it.
The timeout in write_attribute_reply() is different than this one because it corresponds to the maximum time to wait for the reply from the moment when write_attribute_reply() is called.
This might solve your memory leak issue because in this case the asynchronous request object should be correctly freed I think.

Hoping this helps,
Reynald

Rosenberg's Law: Software is easy to make, except when you want it to do something new.
Corollary: The only software that's worth making is software that does something new.

Edited 5 years ago

# 5 years ago
paul.bell	Hi Reynald, Many thanks for your reply. I have tried with the set_timeout_millis on the client side, and things get even stranger. In the fake PS device I make the write take 0.13s = 130ms. If I set the timeout above this, the time taken to wait for the replies is 0.13 as expected. But no matter how low I reduce the timeout, the time never becomes less than 0.1s. And after a while of running the loop, it changes to 0.13: …. wrote in 0.102442026138 wrote in 0.102761983871 wrote in 0.102771997452 wrote in 0.102611780167 wrote in 0.102468013763 wrote in 0.127467155457 wrote in 0.131219863892 wrote in 0.130242109299 …. I suppose I could stop using the group and loop over the device proxies directly, hoping that the overhead of the loop is small if I can at least set the timeout on the reply from each one. cheers, Paul

# 5 years ago

paul.bell

Hi Reynald,
Many thanks for your reply. I have tried with the set_timeout_millis on the client side, and things get even stranger. In the fake PS device I make the write take 0.13s = 130ms. If I set the timeout above this, the time taken to wait for the replies is 0.13 as expected. But no matter how low I reduce the timeout, the time never becomes less than 0.1s. And after a while of running the loop, it changes to 0.13:
….
wrote in 0.102442026138
wrote in 0.102761983871
wrote in 0.102771997452
wrote in 0.102611780167
wrote in 0.102468013763
wrote in 0.127467155457
wrote in 0.131219863892
wrote in 0.130242109299
….
I suppose I could stop using the group and loop over the device proxies directly, hoping that the overhead of the loop is small if I can at least set the timeout on the reply from each one.
cheers,
Paul

# 5 years ago
paul.bell	Hello again, Further to the above, instead of a group I tried iterating over a list of DeviceProxies and calling write_attribute_asynch on each. If I call the cancel_asynch_request it seems to do what I want, ie no memory leak, but only if I call the write at a lower frequency than the hardware limit (ie not too fast for the h/w). If I loop at a higher rate trying to write, then over time the call to write_attribute_asynch itself gets slower and slower until its at the hardware response time. This is the extreme case where I know the write will always be too slow for the loop frequency I am running, but it also implies that if I run long enough in a loop for which the hardware is _occasionally_ too slow, it will slow down over time. If I make sure never to write faster than the h/w response time, it seems to defeat the purpose of not waiting for the response. initial typical loop time: wrote in 0.00325393676758 gradual transition towards, after some seconds: wrote in 0.0102760791779 Here I was looping with a period of 5ms writing to a dummy Tango DS that has a 10ms write time, but I called the cancel request after sending each write. In the first iterations of the loop the loop time is fast, but it tends towards 0.01s If I loop with 10ms period or greater, the loop time remains short, 0.003 approx. But perhaps I am misunderstanding the fundamentals and it makes no sense to do this, write to hardware faster than you know it will respond, repeatedly over an extended period time. In this way the test is unrealistic. Apologies for the long text! cheers, Paul

# 5 years ago

paul.bell

Hello again,
Further to the above, instead of a group I tried iterating over a list of DeviceProxies and calling write_attribute_asynch on each. If I call the cancel_asynch_request it seems to do what I want, ie no memory leak, but only if I call the write at a lower frequency than the hardware limit (ie not too fast for the h/w). If I loop at a higher rate trying to write, then over time the call to write_attribute_asynch itself gets slower and slower until its at the hardware response time. This is the extreme case where I know the write will always be too slow for the loop frequency I am running, but it also implies that if I run long enough in a loop for which the hardware is _occasionally_ too slow, it will slow down over time. If I make sure never to write faster than the h/w response time, it seems to defeat the purpose of not waiting for the response.

initial typical loop time:
wrote in 0.00325393676758
gradual transition towards, after some seconds:
wrote in 0.0102760791779

Here I was looping with a period of 5ms writing to a dummy Tango DS that has a 10ms write time, but I called the cancel request after sending each write. In the first iterations of the loop the loop time is fast, but it tends towards 0.01s If I loop with 10ms period or greater, the loop time remains short, 0.003 approx.

But perhaps I am misunderstanding the fundamentals and it makes no sense to do this, write to hardware faster than you know it will respond, repeatedly over an extended period time. In this way the test is unrealistic.

Apologies for the long text!

cheers,

Paul

# 5 years ago
Reynald	Hi Paul, cancel_asynch_request will avoid the memory leak but will not prevent the device server to execute the write_attribute_request, which takes 10 ms. I think what you are observing is the fact that CORBA can handle a limited number of concurrent requests at a given time. When you start your client, for each write_attribute_asynch() request, CORBA will create a new thread on the server side to handle the request. This is why you see you can send requests faster than the hardware response time at the beginning. But when the number of concurrent threads reaches maxServerThreadPerConnection omniORB configuration parameter (set to 100 by default), no extra thread will be created to handle new requests (which probably have to wait for one of the 100 threads to be available to handle the new request). 1 thread will be freed every 10 ms in your example, so this is probably what you are observing. You could try to increase the maxServerThreadPerConnection omniORB configuration parameter (on the server side) to see whether what you are observing is really linked to that. Here is what the omniORB documentation says about this configuration parameter: maxServerThreadPerConnection default = 100 If the client multiplexes several concurrent requests on a single connection, omniORB uses extra threads to service them. This parameter specifies the maximum number of threads that are allowed to service a single connection at any one time. You could try to do: `export ORBmaxServerThreadPerConnection=200` before starting the device server to see whether this slightly delays the time when write_attribute_asynch() calls start to be blocking, to confirm this theory. Hoping this helps, Reynald Rosenberg's Law: Software is easy to make, except when you want it to do something new. Corollary: The only software that's worth making is software that does something new. Edited 5 years ago

# 5 years ago

Reynald

Hi Paul,

cancel_asynch_request will avoid the memory leak but will not prevent the device server to execute the write_attribute_request, which takes 10 ms.
I think what you are observing is the fact that CORBA can handle a limited number of concurrent requests at a given time.
When you start your client, for each write_attribute_asynch() request, CORBA will create a new thread on the server side to handle the request. This is why you see you can send requests faster than the hardware response time at the beginning.
But when the number of concurrent threads reaches maxServerThreadPerConnection omniORB configuration parameter (set to 100 by default), no extra thread will be created to handle new requests (which probably have to wait for one of the 100 threads to be available to handle the new request). 1 thread will be freed every 10 ms in your example, so this is probably what you are observing.

You could try to increase the maxServerThreadPerConnection omniORB configuration parameter (on the server side) to see whether what you are observing is really linked to that.

Here is what the omniORB documentation says about this configuration parameter:

maxServerThreadPerConnection default = 100
If the client multiplexes several concurrent requests on a single connection,
omniORB uses extra threads to service them. This parameter specifies the maximum
number of threads that are allowed to service a single connection at any
one time.

You could try to do:

export ORBmaxServerThreadPerConnection=200

before starting the device server to see whether this slightly delays the time when write_attribute_asynch() calls start to be blocking, to confirm this theory.

Hoping this helps,
Reynald

Rosenberg's Law: Software is easy to make, except when you want it to do something new.
Corollary: The only software that's worth making is software that does something new.

Edited 5 years ago

# 5 years ago
Andy	Hi Paul. why do you need to send requests faster than the hardware can respond? This does not make much sense unless it is for a short period. If it is for a sustained period you will get a pileup of requests. What is your use case? An alternative is to configure the polling to read your hardware at the maximum frequency or start your own thread to read at the maximum frequency the hardware can support and then use the device caching implemented in Tango to fill the device cache. Clients will then by default get the latest value from the cache instead of piling up waiting for the hardware to respond. Andy

# 5 years ago

Andy

Hi Paul.

why do you need to send requests faster than the hardware can respond? This does not make much sense unless it is for a short period. If it is for a sustained period you will get a pileup of requests. What is your use case?

An alternative is to configure the polling to read your hardware at the maximum frequency or start your own thread to read at the maximum frequency the hardware can support and then use the device caching implemented in Tango to fill the device cache. Clients will then by default get the latest value from the cache instead of piling up waiting for the hardware to respond.

Andy

# 5 years ago
Reynald	Hi Andy, I think Paul wants to write attributes as fast as possible, not read. But I agree it does not seem right to constantly write attributes at a faster speed than the hardware can support. As you wrote, you will get a pileup of requests in this case. Rosenberg's Law: Software is easy to make, except when you want it to do something new. Corollary: The only software that's worth making is software that does something new.

# 5 years ago

Reynald

Hi Andy, I think Paul wants to write attributes as fast as possible, not read.
But I agree it does not seem right to constantly write attributes at a faster speed than the hardware can support.
As you wrote, you will get a pileup of requests in this case.

Rosenberg's Law: Software is easy to make, except when you want it to do something new.
Corollary: The only software that's worth making is software that does something new.

# 5 years ago
Andy	Hi Reynald, you are right - I mis-read his post! But my question remains - what is the use case? Maybe Paul needs a way to discard writes if there are too many requests piling up. But this sounds not that easy? It should be easier is to adapt the clients or to introduce a write buffer which buffers the values for writing them later. However then the client does not know when the value has been written … Cheers Andy

# 5 years ago

Andy

Hi Reynald,

you are right - I mis-read his post! But my question remains - what is the use case? Maybe Paul needs a way to discard writes if there are too many requests piling up. But this sounds not that easy? It should be easier is to adapt the clients or to introduce a write buffer which buffers the values for writing them later. However then the client does not know when the value has been written …

Cheers

Andy

# 5 years ago
paul.bell	Hello, Thanks for your replies. To answer the question about the use case, this is a feedback device that should run at 10Hz. In general the "actuators", which are power supplies, can be written at 10Hz, but there are occasional slow responses that take longer. In those cases, I would like to simply not wait for the reply, and proceed to the next iteration of the 10Hz loop. In this way the test I made is not quite the same, as here I constantly write too fast. In reality, if I skip one slow write, the next one should take less than 0.1s. This was why I thought I could use a 0.1s timeout on the group write, but encountered the memory "leak". I was thinking that instead of "a way to discard writes if there are too many requests piling up", I could just discard replies and assume that the value was written. I assume its something like if 1 in 10 writes time out, the memory increases 10x the rate than if only 1 in 100 time out, but since there will always be some fraction that do time out, over time the memory will increase. I will make a more realistic test where I can control what fraction of writes are too slow, and check the memory. cheers, Paul

# 5 years ago

paul.bell

Hello,
Thanks for your replies. To answer the question about the use case, this is a feedback device that should run at 10Hz. In general the "actuators", which are power supplies, can be written at 10Hz, but there are occasional slow responses that take longer. In those cases, I would like to simply not wait for the reply, and proceed to the next iteration of the 10Hz loop. In this way the test I made is not quite the same, as here I constantly write too fast. In reality, if I skip one slow write, the next one should take less than 0.1s. This was why I thought I could use a 0.1s timeout on the group write, but encountered the memory "leak".
I was thinking that instead of "a way to discard writes if there are too many requests piling up", I could just discard replies and assume that the value was written.

I assume its something like if 1 in 10 writes time out, the memory increases 10x the rate than if only 1 in 100 time out, but since there will always be some fraction that do time out, over time the memory will increase. I will make a more realistic test where I can control what fraction of writes are too slow, and check the memory.
cheers,
Paul

# 5 years ago
paul.bell	Hello again, This might be slightly off the topic of the original heading now, but while I continued to investigate I discovered something else with just write_attribute_asynch on single devices, not groups. I wanted to test the cancel_asynch_request method, which from what I understood from the first reply may not work with groups. So instead of making a group, I just made a list of devices and looped over them setting the attribute on each with a write_attribute_asynch call. The device I have reimplements what is already being done in matlab by the physicists. In matlab, I have 200 devices in a "devlist" and I can measure the total time to write to all of them as follows: tic; for n = 1:numel(devlist), tango_write_attribute_asynch(devlist{n},'Current',theta(n)); end; t1 = toc; The total time is t1 = 0.008982000000000. In my python device, with a list of the same 200 devices, the equivalent loop takes 0.13s in total. Indeed, I can see the time for each write_attribute_asynch is about 0.0007, just that one call in python, nothing else. So it seems that matlab is 10x faster than pytango. Honestly I am not really sure how long a write_attribute_asynch should take, but I would not expect such a difference between matlab and a pytango device. Thanks a lot for any comments Paul

# 5 years ago

paul.bell

Hello again,

This might be slightly off the topic of the original heading now, but while I continued to investigate I discovered something else with just write_attribute_asynch on single devices, not groups. I wanted to test the cancel_asynch_request method, which from what I understood from the first reply may not work with groups. So instead of making a group, I just made a list of devices and looped over them setting the attribute on each with a write_attribute_asynch call.

The device I have reimplements what is already being done in matlab by the physicists. In matlab, I have 200 devices in a "devlist" and I can measure the total time to write to all of them as follows:

tic; for n = 1:numel(devlist), tango_write_attribute_asynch(devlist{n},'Current',theta(n)); end; t1 = toc;

The total time is t1 = 0.008982000000000.

In my python device, with a list of the same 200 devices, the equivalent loop takes 0.13s in total. Indeed, I can see the time for each write_attribute_asynch is about 0.0007, just that one call in python, nothing else. So it seems that matlab is 10x faster than pytango. Honestly I am not really sure how long a write_attribute_asynch should take, but I would not expect such a difference between matlab and a pytango device.

Thanks a lot for any comments

Paul