Thread hold GIL(Global interpreter Lock) around 6-12 second while reading from proxy of remote DS

Dear All,

We are using Python 2.7.12, Tango 9.2 ,PyTango 8.1.8 and taurus 3.7. In our application architecture we have two threads, the main Thread handles GUI updates operation and other thread handles data collection operations. In data collection thread main task is to create device proxy for local as well as remote machine and also collect attribute list(by using get_attribute_list() methods). we are updating the attribute list and it's properties on regular time interval(5 sec).

We have observed that while reading the attribute or getting attribute list from remote device server, we are suspecting that it holds GIL for 6-12 second. This issue in-turn hangs GUI because main Thread is not getting GIL for that particular time.

I have attached example and log file to support above scenario.In which, I have created two thread, main thread prints Count value and second thread creates device proxy and read attribute repetitively.In this example we observed that main thread get GIL after 10-12 secs.Please go through log file for same.
This is some portion of log file :

2018-03-08 18:59:07,223 MainThread : Thread: MainThread,count:787
2018-03-08 18:59:07,216 Thread-1 : Exception occurred
2018-03-08 18:59:07,223 Thread-1 : Reading attribute Date:
2018-03-08 18:59:13,233 Thread-1 : Exception occurred
2018-03-08 18:59:13,233 Thread-1 : Reading attribute Date:
2018-03-08 18:59:19,244 Thread-1 : Exception occurred
2018-03-08 18:59:19,244 Thread-1 : Reading attribute Date:
2018-03-08 18:59:13,233 MainThread : Thread: MainThread,count:786
2018-03-08 18:59:25,254 Thread-1 : Exception occurred
2018-03-08 18:59:25,254 Thread-1 : Reading attribute Date:
2018-03-08 18:59:25,254 MainThread : Thread: MainThread,count:785
2018-03-08 18:59:31,265 Thread-1 : Exception occurred
2018-03-08 18:59:31,265 MainThread : Thread: MainThread,count:784

Explanation about logs:
In between main thread count 786 to 785, GIL was held by thread 1 which is around 12 second.This is observed throughout the code.

Is this type of behavior expected ? is there anything wrong in our implementation?
If this is the desired behavior then can we have attribute read method with timeout parameter.


Thanks & Regards,
TCS GMRT



Regards,
TCS_GMRT
As a general rule, libtango and pytango should share the same major and minor version (for a version X.Y.Z, X and Y should match).

Could you try with pytango 9.2.2 and see if it makes any difference?

Thanks,
Thanks VIncent for prompt reply.

I have tried with PyTango 9.2.2 but not get any help for above issue.The issue is still observed while reading attribute.

This is some portion of logs:

2018-03-09 19:54:39,889 MainThread : Thread: MainThread, count:954540
2018-03-09 19:54:39,884 Thread-1 : Exception occured
2018-03-09 19:54:39,890 Thread-1 : Reading attribute Date:
2018-03-09 19:54:48,910 Thread-1 : Exception occured
2018-03-09 19:54:48,910 MainThread : Thread: MainThread, count:954539

In between count value count:954540 to count:954539, there is almost 9 seconds gap which indicate that Thread 1 is holding GIL for that period of time.

Please correct me if my understanding is wrong or I am not missing some step.Please go through attached code snippet and correct me if it need to update.

Thanks & Regards,
Hitesh Patel
Regards,
TCS_GMRT
Thanks for trying with the last pytango release, unfortunately I can't reproduce your issue.

How long does it take for you before the main thread gets stuck?

Also, what OS do you use exactly? And what happens on the server side? Is the exception a timeout?

Hi,

  • The read_attribute(s), write_attribute(s), write_read_attribute(s) and command_inout methods in PyTango all release the GIL.
  • The get_attribute_list, get_attribute_info are in fact not releasing the GIL. These calls are handled internally by TANGO and usually they are quite fast (sorry for the lack of precision in the language). They don't normally justify a delay of seconds if the server is working properly.
  • Feel free to make a PR on PyTango if you need the extra speed in these calls. I will be happy to review it.

  • Can you try a simple pytango client (no gui) to see how the system behaves?

    Do you know where the Exception occured message is coming from?

    Don't be so quick to blame the GIL. I would bet more on a taurus lock holding the application than the GIL.

    Hi again,

    Sorry but I didn't see the attachment in your initial post.
    Please ignore the taurus lock statement and the request to make a simple pytango client. I got confused when you posted you were using taurus. I will have a better look at code.
    Hi TCoutinho,

    Thanks for your detailed response for GIL releasing in PyTango. Please go through attached code snippet.

    I have created worker thread from main thread in which I have created client for remote device server using deviceProxy API and repetitively reading Date attribute of that device.while testing we have observed that if server is working slowly then reading attribute from thread is holding GIL because at that time main thread is not printing count. If server is working normally then this issue is not observed.
    So, we are looking for some timeout parameter while reading method.

    please let me know if you want any other information.

    Thanks,
    Hitesh
    Regards,
    TCS_GMRT
    Could you replace:
    dev.Date
    with:
    dev.read_attribute('Date')
    and see if it helps?

    Thanks
    Edited 4 months ago
    Hi Vincent M,

    Thanks for your suggestion. I will try your suggestion and update the status.

    Actually,we have already changed the way of reading attribute.we have used PyTango.futures mode as GreenMode which is helping us to have attribute result in predefined timeout parameter.

    we have used as reading method as >> dev.read_attribute('Date',wait=True,timeout=1)

    The above way has helped us to have attribute result within predefined timeout or else timeout exception.

    Thanks,
    Hitesh
    Regards,
    TCS_GMRT
    The reason I asked this is because the "dev.<attribute_name>" interface calls "get_attribute_list" which, according to Tiago, does not release the GIL. In theory, this call should not take seconds, but maybe something is off with your server. On the other hand, "read_attribute" does release the GIL properly. So depending on your results, we should be able to tell you whether your problem comes indeed from the combination of "get_attribute_list", the GIL, and your server (which is my main bet), or if we should look in another direction.

    Cheers
     
    Register or login to create to post a reply.