No heartbeat error on Event Subscription


Dear Andy,

I want to thank the community for their continuous support.

I have one more query. I hoe you won't mind me asking one more question.

In previous post it was pointed that the the value of the TANGO_HOST variable on both the client and server end should be same in order for the events to work.

Consider the following hypothetical scenario:

Device Server A is registered with the Database server running on system A' and the Device Server B is registered with the Database server running on system B'. Both system A' and system B' have different TANGO Database. The value of TANGO_HOST environment variable for system A' and B' would be Hostname_A':Port_A' and Hostname_B':Port_B' respectively.

Now I run Tango client which subscribes to the events raised from Device Server A and Device Server B on system C'.

I have read in TANGO Control System Manual 8.1 (Appendix A - section A.12.3.1) that I can specify multiple values in the TANGO_HOST environment variable. So I should set the value of the TANGO_HOST variable on system C' as Hostname_A':Port_A',Hostname_B':Port_B'. But this value is bound to be different from the TANGO_HOST value on system A' and system B'.

In this scenario will client receive events from both the device servers ?

Note: Assumption is system there can be no communication between system A' and system B', whereas system C' could communicate with the system A' and system B'. I understand that the trivial solution is to run database server on system C'.

Regards,
Vatsal Trivedi
Dear Vatsal,

your question is completely valid and in fact is related to your bug. The event system is designed to work for multiple TANGO_HOSTs i.e. a client C can receive events from deviceA with TANGO_HOST=A:port and from deviceB with TANGO_HOST=B:port. There is no need to define another database for them to work. This is a completely valid scenario and should work for you now already. You can test it to be sure it is the case.

FYI the bug you found when the hostname is in upper case is related to your question because to implement the multi TANGO_HOST feature we use the FQDN of the device to identify the events. This allows clients to receive events from devices with different TANGO_HOSTs even if the device names are the same i.e. the TANGO_HOST of the device is added to the device name to form the FQDN thereby making it unique across multiple TANGO_HOSTs. It was because of this that the wrong TANGO_HOST with capital letters got added to the event filter and you were not receiving events. The Java device server api uses the system host name without changing the case while the Java client api uses the system host name but changes it to lower case. That is why we asked you to change your system name to lower case. We will avoid this in the next version by passing the FQDN of the device in the device server to the client too. That way they both have the same FQDN no matter if the hostname is in upper or lower case nor if the host name is defined in the DNS server or not.

Andy
Edited 8 years ago
Alucard
I have read in TANGO Control System Manual 8.1 (Appendix A - section A.12.3.1) that I can specify multiple values in the TANGO_HOST environment variable. So I should set the value of the TANGO_HOST variable on system C' as Hostname_A':Port_A',Hostname_B':Port_B'. But this value is bound to be different from the TANGO_HOST value on system A' and system B'.

I forgot to answer this part of your question. DO NOT use the syntax of multiple TANGO_HOST in your case. This syntax is reserved for when you have multiple database servers for the SAME database i.e. TANGO control system. Clients will try to connect on the first available database server in the list. If the one it is connected to fails due to a crash or whatever it will automatically try the next one. It will still try to find the same device in the alternate database server.

If you have distinct TANGO control systems with different device then do not concatenate the TANGO_HOSTs. Keep them separate.

Andy
Dear all!

I'm also getting the "no heartbeat" errors.
I'm trying to set up a system using Tango 9.2.2 and ZeroMQ 4.1.4 on CentOS 7. I've got two machines: one (called tango-dev) which is running the Tango database and a Starter and another (called t92ds-template) which is just running Starter.
Both are visible from DNS and both have proper hostnames configured (lowercase and containing full addresses with domains in both cases).
TANGO_HOST variables on both machines is set to tango-dev.cps.uj.edu.pl:10000.

Since we use almost exclusively Python and I can't find a way to subscribe to heartbeat events with PyTango, my tests involved launching Astor on both machines as well as on my own client machine (which is not in the DNS). The server machines don't have graphical libraries installed, so I'm using a X server on my own machine.

If Astor is launched on the host (tango-dev), no errors are displayed.
If it's launched on one of the other machines, I get such messages:

[admin@t92ds-template ~]$ astor 
Display is 192.168.105.108:0
====================== ZMQ (3.22) event system is available ============================
Build  GUI :1926 ms
Total time to subscribe on 2 hosts : 323 ms
Total time to start Astor 2071 ms
tango://tango-dev.cps.uj.edu.pl:10000/dserver/starter/t92ds-template Not found
tango://tango-dev.cps.uj.edu.pl:10000/dserver/starter/t92ds-template Not found
subscribeChangeServerEvent() for tango/admin/tango-dev/Servers OK!
tango://tango-dev.cps.uj.edu.pl:10000/dserver/starter/tango-dev Not found
tango://tango-dev.cps.uj.edu.pl:10000/dserver/starter/tango-dev Not found
tango://tango-dev.cps.uj.edu.pl:10000/dserver/starter/t92ds-template Not found
tango://tango-dev.cps.uj.edu.pl:10000/tango/admin/tango-dev/servers.idl5_change ?  NOT FOUND
tango://tango-dev.cps.uj.edu.pl:10000/tango/admin/tango-dev/servers.idl5_change ?  NOT FOUND
tango://tango-dev.cps.uj.edu.pl:10000/tango/admin/tango-dev/state.idl5_change ?  NOT FOUND
tango://tango-dev.cps.uj.edu.pl:10000/dserver/starter/tango-dev Not found
Mon Dec 19 17:01:08 CET 2016
tango/admin/t92ds-template  has received a DevFailed :	No heartbeat from dserver/starter/t92ds-template
HostStateThread.StateEventListener on tango/admin/t92ds-template
Mon Dec 19 17:01:08 CET 2016
tango/admin/tango-dev  has received a DevFailed :	No heartbeat from dserver/starter/tango-dev
HostStateThread.StateEventListener on tango/admin/tango-dev
tango-dev
Tango exception
Severity -> ERROR 
Desc -> No heartbeat from dserver/starter/tango-dev
Reason -> API_NoHeartbeat
Origin -> ZmqEventConsumer.checkIfHeartbeatSkipped()
tango://tango-dev.cps.uj.edu.pl:10000/dserver/starter/t92ds-template Not found
tango://tango-dev.cps.uj.edu.pl:10000/dserver/starter/tango-dev Not found
tango://tango-dev.cps.uj.edu.pl:10000/dserver/starter/t92ds-template Not found
Mon Dec 19 17:01:18 CET 2016
tango/admin/t92ds-template  has received a DevFailed :	No heartbeat from dserver/starter/t92ds-template
HostStateThread.StateEventListener on tango/admin/t92ds-template
Mon Dec 19 17:01:18 CET 2016
tango/admin/tango-dev  has received a DevFailed :	No heartbeat from dserver/starter/tango-dev
HostStateThread.StateEventListener on tango/admin/tango-dev
tango-dev
Tango exception
Severity -> ERROR 
Desc -> No heartbeat from dserver/starter/tango-dev
Reason -> API_NoHeartbeat
Origin -> ZmqEventConsumer.checkIfHeartbeatSkipped()
tango://tango-dev.cps.uj.edu.pl:10000/dserver/starter/tango-dev Not found
tango://tango-dev.cps.uj.edu.pl:10000/dserver/starter/t92ds-template Not found
Mon Dec 19 17:01:28 CET 2016
tango/admin/t92ds-template  has received a DevFailed :	No heartbeat from dserver/starter/t92ds-template
HostStateThread.StateEventListener on tango/admin/t92ds-template
Mon Dec 19 17:01:28 CET 2016
tango/admin/tango-dev  has received a DevFailed :	No heartbeat from dserver/starter/tango-dev
HostStateThread.StateEventListener on tango/admin/tango-dev
tango-dev
Tango exception
Severity -> ERROR 
Desc -> No heartbeat from dserver/starter/tango-dev
Reason -> API_NoHeartbeat
Origin -> ZmqEventConsumer.checkIfHeartbeatSkipped()
tango://tango-dev.cps.uj.edu.pl:10000/dserver/starter/tango-dev Not found
tango://tango-dev.cps.uj.edu.pl:10000/dserver/starter/tango-dev Not found
tango://tango-dev.cps.uj.edu.pl:10000/dserver/starter/t92ds-template Not found
tango://tango-dev.cps.uj.edu.pl:10000/dserver/starter/t92ds-template Not found
Mon Dec 19 17:01:38 CET 2016
tango/admin/t92ds-template  has received a DevFailed :	No heartbeat from dserver/starter/t92ds-template
HostStateThread.StateEventListener on tango/admin/t92ds-template
Mon Dec 19 17:01:39 CET 2016
tango/admin/tango-dev  has received a DevFailed :	No heartbeat from dserver/starter/tango-dev
HostStateThread.StateEventListener on tango/admin/tango-dev
tango-dev
Tango exception
Severity -> ERROR 
Desc -> No heartbeat from dserver/starter/tango-dev
Reason -> API_NoHeartbeat
Origin -> ZmqEventConsumer.checkIfHeartbeatSkipped()
tango://tango-dev.cps.uj.edu.pl:10000/dserver/starter/tango-dev Not found
tango://tango-dev.cps.uj.edu.pl:10000/dserver/starter/t92ds-template Not found
Mon Dec 19 17:01:48 CET 2016
tango/admin/t92ds-template  has received a DevFailed :	No heartbeat from dserver/starter/t92ds-template
HostStateThread.StateEventListener on tango/admin/t92ds-template
Mon Dec 19 17:01:48 CET 2016
tango/admin/tango-dev  has received a DevFailed :	No heartbeat from dserver/starter/tango-dev
HostStateThread.StateEventListener on tango/admin/tango-dev
tango-dev
Tango exception
Severity -> ERROR 
Desc -> No heartbeat from dserver/starter/tango-dev
Reason -> API_NoHeartbeat
Origin -> ZmqEventConsumer.checkIfHeartbeatSkipped()
tango://tango-dev.cps.uj.edu.pl:10000/dserver/starter/tango-dev Not found
tango://tango-dev.cps.uj.edu.pl:10000/dserver/starter/t92ds-template Not found
Astor exiting….
 
======== Shutting down ZMQ event system ==========


Attribute-related events work all the time, there's no problem with them. I've only tested subscribing to such events using Python and PyTango.

Does any of you have any idea what might be the problem?

Best regards,
Łukasz
Edited 7 years ago
Hi Łukasz

It seems that your database has inconsistency.
2 Starter devices are defined: tango-dev and t92ds-template.
But the admin device for these Starter server are not defined in database.

tango://tango-dev.cps.uj.edu.pl:10000/dserver/starter/tango-dev Not found

Regards
Pascal
Hello!

Actually, they are defined in the database, there's a picture in the attachment that proves that.
Also, as I've written in my previous message, I don't get those errors when I run Astor on the machine with database.

Best regards,
Łukasz
Hi
On your picture you can see on the right part of Jive:
host: 192.168.105.225 (192.168.105.225)

If your server is running on a host called tango-dev, you must have:
host: tango-dev (192.168.105.225)

It seems that your network configuration is not OK

Regards
Hello!

Okay, thanks for that! I will try to look into it, but I've no idea what is wrong.
TANGO_HOST is configured to "tango-dev.cps.uj.edu.pl:10000" on both machines (the one running the database and my client).

What hostname is used there? I've got only static hostname configured and what you've mentioned looks like a transient hostname.

EDIT: I've tried configuring a transient hostname but it didn't solve the issue. Also, the transient hostname it reset upon start-up of the operating system - maybe that's the reason of the problem?

Best regards and thanks again for that suggestion,
Łukasz
Edited 7 years ago
I'm not sure if anyone mentioned that but we solved the problem that Łukasz was writing about.

We get rid of that no heartbeat and tango://tango-dev.cps.uj.edu.pl:10000/dserver/starter/tango-dev Not found.
It occurred that we need to add TANGO_HOST name to /etc/hosts, even we had DNS records for it.

Moreover, it speeded up start of the Astor and Sardana scans significantly.

I hope it will be helpful to anyone.

Pascal
Hi
On your picture you can see on the right part of Jive:
host: 192.168.105.225 (192.168.105.225)

If your server is running on a host called tango-dev, you must have:
host: tango-dev (192.168.105.225)

It seems that your network configuration is not OK

Regards

In Astor or Jive we had all machines like: IP address (IP address). We added DNS records for all the machines with DSes, but it didn't change. Only records in /etc/hosts changed it to host name (IP address). I'm not sure if it does matter to all machines except the tango host, we didn't noticed any other problems with that.

Now the thing is shouldn't it be able to use those DNS records?

King regards,
Michał aka Fałek
 
Register or login to create to post a reply.