DS started with Starter creating a huge log file

When we start a TANGO DS, say "x", via Starter then there is a log file created in the location "/var/tmp/ds.log/". This log file fills very fast. However, when "x" is started via command line (not by using Starter) then there are no logs created at "/var/tmp/ds.log".

Screenshot of the location attached.

TANGO version: 9.2.5

Note: I have also posted the same concern on Github (https://github.com/tango-controls/starter/issues/3)
Regards,
TCS_GMRT
Hi
It is not a bug. It is a feature.
This file contains all stderr messages written by the server.
It is used to understand why a server does not start or fails.
If your file is too big, that means that your server writes to many error messages
Regards
Pascal
Thanks, Pascal for the reply.

If your file is too big, that means that your server writes to many error messages
This is a dummy scenario, only to explain:
Consider 2 devices, say "a/b/x" & "a/b/y".

Device "a/b/x" has the following code:

 ….
try {
	AttributeProxy attrProxy = new AttributeProxy("tango://<TANGO_HOST>:10000/a/b/y/attr");  —-> line 1
} catch (DevFailed e) {
	// TODO Auto-generated catch block
	LOG.error(e);    —> line 2
} ….

The above code on "a/b/x" is executed every 2 seconds till the time attribute proxy is created.
Both the Devices are started simultaneously, say at a time "T".

Say device "a/b/x" takes T+10 seconds to start (to come out of initDevice() method) and say the device "a/b/y" takes approx 30 seconds to start (to come out of initDevice() method). Therefore, till T+30 seconds, on the execution of "line 1" exception "line 2" is executed, as a proxy cannot be created for the device attribute "a/b/y/attr".

However, my concern is, even if there is a try-catch block for "line 1", the log file created on "/var/tmp/ds.log" logs this exception 10 times ( (30-10) seconds/ 2 seconds). I think it should not be logging as there is a catch block. Am I missing something here?

Such scenario, in which one device "a/b/y" starts at a later point in reference to another device "a/b/x" who is writing an attribute, is increasing the file size at "/var/tmp/ds.log" for us.

Let me know if the above example helps to explain the problem/scenario.
Regards,
TCS_GMRT
Edited 5 years ago
Do your devices run on the same host? If this is the case, the scenario you described is where the startup levels, supported by the Starter device, come in play. Your problem can be easily solved assigning device a/b/x a startup level greather than device a/b/y. The Starter will take care of serializing the startup and avoid the error.
Cheers,
Lorenzo
Do your devices run on the same host?
No, they run on different TANGO HOST.
Regards,
TCS_GMRT
Edited 5 years ago
Hi All,

Any help will be appreciated.
Regards,
TCS_GMRT
I can see two possible solutions:
1) remove the LOG.error() call, or put in place some heuristics in order to thin away retrying the AttributeProxy instantiation
2) setup the systems (hosts) to have the TANGO devices always running. This is, anyhow, the preferred approach with TANGO, e.g to have the devices always running and leverage the TANGO State in order to manage the service (e.g. turn the device ON, OFF,…). This way you'll possibly face a few errors logged just during hosts startup.
Edited 5 years ago
2) setup the systems (hosts) to have the TANGO devices always running. This is, anyhow, the preferred approach with TANGO, e.g to have the devices always running and leverage the TANGO State in order to manage the service (e.g. turn the device ON, OFF,…). This way you'll possibly face a few errors logged just during hosts startup.
As the current architecture does not allow me to implement this probable solution. Do we still see any other workaround?

It is not a bug. It is a feature.
Is there a way by which I can bypass this feature or suppress this feature, as this is becoming a showstopper issue for us?

If someone can share their thoughts on this concern.
However, my concern is, even if there is a try-catch block for "line 1", the log file created on "/var/tmp/ds.log" logs this exception 10 times ( (30-10) seconds/ 2 seconds). I think it should not be logging as there is a catch block. Am I missing something here?
Regards,
TCS_GMRT
TCS_GMRT
However, my concern is, even if there is a try-catch block for "line 1", the log file created on "/var/tmp/ds.log" logs this exception 10 times ( (30-10) seconds/ 2 seconds). I think it should not be logging as there is a catch block. Am I missing something here?

Not sure I understand this… the execution goes through the catch block as long as the try fails, e.g. as long as the other device is not available online. If, as you wrote, the other device takes some time to startup/init, I'm not surprised you get some 10-15 entries in the log.
Edited 5 years ago
However, my concern is, even if there is a try-catch block for "line 1", the log file created on "/var/tmp/ds.log" logs this exception 10 times ( (30-10) seconds/ 2 seconds). I think it should not be logging as there is a catch block. Am I missing something here?

Actually, the point that I am trying to make is if the developer has caught the exception (line 2 in the above code snippet), then TANGO should not log this error message, as the developer has already logged in, is what I feel.

If, as you wrote, the other device takes some time to startup/init, I'm not surprised you get some 10-15 entries in the log.
Agree, no problem with 10-15 entries, but what if the DS will not be alive for another couple of hours because of maintenance or something like that. How do we tackle this scenario because this is the actual scenario at our site.
Regards,
TCS_GMRT
 
Register or login to create to post a reply.