No heartbeat error on Event Subscription

Hi,

I am exploring the Tango Events. I have created a device server "JDeviceforEvent" and it only has one attribute named "Speed" with following configuration.

Attribute: Speed
Attribute Type: DevDouble
Read/Write Type: READ_WRITE
isPolled: true
pollingPeriod: 3000 ms
pushChangeEvent: false
checkChangeEvent: true
changeEventAbsolute: "1"

I've written a simple client which subscribe to the change event of the Speed attribute.

I run the device server and then run the client. When the event is subscribed for the first time I get an event as expected. But at every 10s, I get an error stating "No heartbeat from dserver".

I have tested it on TANGO Virtual box and it works as expected, i.e whenever the value of the Speed Attribute is change by an absolute value of 1 or more an event is raised. I don't get any heartbeat exception.

I have already posted this issue in the mailing list. They found that it was because of some bug. They fixed the bug and provided new client API. But I am still not able to resolve the issue. It is because of my network configuration. I'm not able to understand what changes should be made in the network configuration in order to resolve this.

Please help me in resolving the issue.

I've attached the code for my device server, client and the modified client API. I'm using JTango-9.0.3 jar.

Note: This is what the suggestion I received while using the modified client API in the mailing community — Put TangORB-9.0.3-a.jar at the beginning of the CLASSPATH.
Hi,

It seems that I'm not able to attach the jar file for modified client API. You can use the following link to download the modified TangORB-9.0.3-a.jar file:

http://ftp.esrf.fr/pub/cs/tango/TangORB/

Regards,
Vatsal Trivedi
Hi Vatsal,

Have you managed to workaround this problem?

From mail list I understand that the issue is reproducible only on boxes with 2> network interfaces. Quite unhelpful as our production boxes normally have 2.

In addition after No_Heartbeat error client still gets a value that is read synchronously (ZmqEventConsumer.java:576) which is confusing.

Regards,

Igor.

Hi Igor,

In the mail trail, it was pointed that the issue can be reproduced if the client runs on the machine with more than one network interface. The new client API which was provided didn't help in resolving the issue. It was then concluded that the issue is somehow related to my network configuration.

I'm still not able to resolve it. Your help would be greatly beneficial.

I'm running the Tango Server and the client on the same machine. System configuration is as below:

IP Address :-> 192.168.118.210.
Hostname :-> PC5-HP
TANGO_HOST :-> 192.168.118.210:20000

I've attached the snapshot of "ipconfig /all" and "hosts" file of my windows system. I hope it might help you in understanding network configuration.

I've set the logging level to "TRACE" for the Tango Device Server and attached the log generated by the device server. Also I'm attaching the output of the Tango client. In the server log it is stated that "Heartbeat sent for tango://PC5-HP.ncra.tifr.res.in:20000/dserver/jdeviceforevent/jdevt1.heartbeat", but somehow it is not reaching to the client.

I also get a strange error stating "device tango/admin/pc5-hp not defined in the database" in the "Command Prompt" from which I start the Tango Database server. I'm not sure whether the issue is because of it. I've also attached the snapshot of the same.

Please help in resolving the issue. As I'm not able to use events I'm not able to use half of the major functionality provided by the Tango Control System Framework.

Regards,
Vatsal Trivedi
Hi Vatsal,

Thank you very much for all the information.

Here is description how I worked around this particular problem.

As events system seems to be working even though you get this API_NoHeartbeat exception. So I decided to just ignore it:


//event listener defined as field
private TangoEventListener<Long> tikTakListener = new TangoEventListener<Long>() {
        @Override
        public void onEvent(EventData<Long> data) {
            //do stuff
        }

        @Override
        public void onError(Exception cause) {
            //ignore Heartbeat
            if (cause.getMessage().contains("API_NoHeartbeat")) return;
            //otherwise set state to FAULT
            logger.error(cause.getMessage(), cause);
            setState(DevState.FAULT);
        }
    };

I also started a new branch TangORB-9.1.1.hzg. Where I removed synchronous reads from the remote Tango when NoHeartbeat is happening and also when client is subscribing.

So now I have proper behavior in my test cases (and hopefully in the production this week smile ):
- client subscribes for an attribute change
- once server starts pushing events client gets them
- when server stops pushing client does not get anything

This works fine. Currently client is on Windows machine with two network interfaces and the server is on debian 7 also with two network interfaces.

Though it is not a fix nor a real understanding why this NoHeartbeat is happening seems to be a workaround for us.

Hope this helps.

Igor.
Edited 8 years ago
Igor,

this sounds like a bug. Have you filed a bug report?

Andy
Andy,

No I did not. Can do it in a moment.

I would also raise an issue concerning the client API implementation. Specifically these synchronous calls when client subscribes and when NoHeartbeat happens. This is very misleading as client can not recognize whether it gets value because of the event or it just happens that API has read value and passes it to the client. So basically in my case client got values even though server did not produce anything (server deliberately pushes events). And as client uses this event as a trigger for some routine (data acquisition in this case) you can image what was happening.

So basically API should not attempt to decide for the client to read value synchronously, client may do so in error handler.

Same story with the first read when client subscribes. I can image why this was done (Hello GUI!), i.e. client subscribes, conveniently gets a value, displays it and then waits for a change. But, this must be done by the client explicitly - client reads value, displays it, subscribes for changes and waits.

What do you think? Is C++ implementation has the same contract?

Igor.
Igor,

thanks.

Yes C++ has the same behaviour concerning events. I agree with you it is confusing and hides the fact that events are not coming through sometimes. Your proposal sounds reasonable but it might be difficult to change now because a number of GUIs depend on this behaviour. It should at least be discussed with the community to see if changing the behaviour is possible in a future release.

Andy
Hi Igor,

Thanks for developing a workaround to the issue.

As per my understanding the client uses the Heartbeat event to make sure the Device Server which is publishing the event is still alive. But currently there is no way to know at the client end whether the missing heartbeat is because the device server is dead or because the heartbeat has got lost. Considering this your solution is acceptable.

I hope that the issue gets fixed by the time of Tango 9 release.

Also I want to confirm my understanding regarding the way in which the heartbeat event mechanism works (is implemented) in Tango.

The heartbeat event raised by the device server is first sent to device of the DServer class residing in the Device Server Process and then DServer device forwards it to the all the clients who have subscribed for the events. I inferred it from the following line of the DeviceServer log:

DEBUG 2015-08-10 12:41:14,259 [Event HeartBeat - dserver/JDeviceForEvent/jdEvt1] org.tango.server.events.EventManager.run:603 - Heartbeat sent for tango://PC5-HP.ncra.tifr.res.in:20000/dserver/jdeviceforevent/jdevt1.heartbeat


Similarly I believe that the subscription request sent by the client would come to the DServer device and then the DServer device will make some changes like adding the name of the client in a list.

Is my understanding correct ?

Also is the behavior same for all events raised by the Device Server or it is specific to heartbeat event ?

Once again I appreciate the efforts which you put for resolving the issue.

Regards,
Vatsal
Dear Vatsal,

I discussed the event issue with our Java expert here and he confirms that this feature works and is used extensively here. This means the problem you are encountering is either a bug or specific to your setup. The workaround from Igor will not solve the problem. Your problem is you are not getting any events. The heartbeat is simply a symptom of this. You are right the heartbeat is to check the device server is alive. I don't know the details of the implementation exactly but your assumption that the DServer common admin device sends the heartbeat sounds logical.

To find out why events are not working could you fire up atkpanel on your device and check what the errors are in the error log and what the View -> Diagnostics windows says about support for events for your device attributes.

I see you are on Windows - have you switched the firewall off? If I think of any other reasons why events could not be working and how you can check I will let you know.

Kind regards

Andy
 
Register or login to create to post a reply.