HDBPPES attribute errors

Hi all,

First off, thank you to everyone in the ecosystem, the amount we've been able to get done on our product line using TANGO and other various GPL codes in such a short amount of time is incredible and has garnered some interest from the ion implant industry, so that's kinda neat!

My question follows: I have been playing with HDBPP for archiving and historian purposes. What's kind of weird is that some attributes have archived from time to time but won't archive consistently. I hadn't really paid a lot of attention to this for a while as I was working on other things, but now that we've gotten most of our automation handling done, it's time to dig in.

I've made a sample set of hdbpp attributes calling three separate homegrown device servers. Here is the setup:

I'm running Tango 9.2.5a installed via debian package libtango9 on a lubuntu machine. HDBPPES is also running on this same host. IP addr of this host is 192.168.0.100, host name rarerf-pc. Into the hosts file I've placed rarerf-pc and mapped to 192.168.0.100 since I don't have a full blown DNS server, just the usual crappy one through a consumer switch/router. When I run DatabaseDS.DbGetCsDbServerList, it correctly spits out rarerf-pc:10000. My TANGO_HOSTS environment var is also set to rarerf-pc:10000.

Two pytango device servers, MKSRGA class at rarerf-pc:10000/rareRF/Control/ChamberRGA and LabJackGasCab class at rarerf-pc:10000/rareRF/Control/GasCab, are running on the same machine as the tango host. I have one other pytango device server, ThunderOpticsOES class at rarerf-pc:10000/rareRF/Archiving/OES, which is running on Windows on a 9.2.4 tango on a separate machine. This VM has its TANGO_HOST set to 192.168.0.100:10000, as I haven't had a chance to add the host mapping to the hosts file.

All device servers work in the sense that they are able to be viewed, queried, and interacted with via Jive, ATKViewer, and itango.

I've configured HDBPP to look at various attributes on these device servers. Basically, anything that is running locally on the same host as the tango host throws a periodic event timeout error:
(note: ignore the interpolation error, I'm using an interpolation library as a super duper hacky way to handle a ion gauge with a super funky curve when it's off; basically when gauge is off, reported voltage is off the scale, so it errors out the attribute, so I know it's off. There are a thousand better ways to handle this and it needs to get refactored but it works for now.)


This seems somewhat similar to the thing that was happening to me back when I was having issues with ATK update rate in the sense that events were not being received. This was fixed by me making sure that tango host was set correctly until I have a chance to compile 9.3.3 and get JTango as well. This would also kind of sort of make sense insofar that the only device server that is working is running remotely, so any issue with device filtering should be taken care of via DNS at the consumer router, or the device server is sending everything via ip address versus FQDN name.

I'm at a bit of a loss here: I can see if the device servers work when run on a separate machine to validate my assumption that this is due to running device servers on the same host as the tango host, but even if that works, I'm not sure what the solution here is short of trying to upgrade to 9.3.3 and upgrade Jtango.

Any help would be appreciated!

Cheers,
Mark
Edited 4 years ago
Well, if this isn't the damndest thing. I updated my hosts file on the windows machine to 192.168.0.100 rarerf-pc and changed TANGO_HOSTS to rarerf-pc:10000, and suddenly, the same behavior!!! I reset windows Tango Hosts to 192.168.0.100, and voila, HDBES is happy again with those two signals.

That kind of puts a nail in my theory, eh?
Hi Mark,

I would definitely recommend to upgrade to latest cppTango 9.3.3 and PyTango 9.3.0 because we fixed many things related to the event system.
Are these attributes polled?
What is the archive event period for these events?

Kind regards,
Reynald
Rosenberg's Law: Software is easy to make, except when you want it to do something new.
Corollary: The only software that's worth making is software that does something new.
Hi Reynald!

Ok, off to try to figure out why ZMQ is failing during the build in 9.3.3-RC1. Looking at the cpptango PRs it looks like there's an issue with 4.3.2, I'll see if I can pull down the newest PR and whether that fixes it.


Yep, attributes are polled, periods are varying but anywhere from 1000ms to 15000ms.


Well, I spent most of today yak-shaving; couldn't manage to get ZMQ happy during the build. I'm going to try again tomorrow and while that is going on, run HDBES and HDBCM on a separate machine and see if the behavior is consistent with the above, if only for more data.
Hi,

Did you try to install latest cppzmq release (zmq.hpp file) because this file which is sometimes provided with some distribution libzmq packages is not compatible with cppTango?
We are working on a solution to notify the user during the cmake configuration process (see https://github.com/tango-controls/cppTango/pull/561).

I suggest to install cppzmq 4.3.0 (You can probably stick with your libzmq version) and to compile cpptango with CMake using -DCPPZMQ_BASE=<cppzmq home folder> flag as described in INSTALL.md file.

How did you manage to get libzmq 4.3.2? It looks like it is not yet officially released (See https://github.com/zeromq/libzmq/releases). 4.3.2 is probably still a work in progress so I would not use it.
Latest official libzmq version seems to be v4.3.1.

Do not hesitate to share the compilation errors you get here on the forum or to create an issue on cppTango github repository.

Hoping this helps.
Reynald
Rosenberg's Law: Software is easy to make, except when you want it to do something new.
Corollary: The only software that's worth making is software that does something new.
I appreciate the help! I had grabbed the latest branch, not the release. Dumb of me.

I haven't made a new bugreport as presently the 9.3.3-rc1 tar fails during build if naively upgrading from 9.2.5a on Lubuntu 18, which by default gets you zmq 4.2.5 and some cppzmq version I have to dig on. I've started with a fresh build again to check to make sure this is the case (it should be, it looks like what the pull request was addressing) and if so, will follow your lead on 4.3.0.

Trip report:

Fresh install of Lubuntu 18.10. Added i386 architecture via add i386 architecture apt update apt upgrade.

Installed 9.2.5a by doing sudo apt install cmake, mysql-server, libmysql-client, omniidl, tango-db, libtango-dev, python-pytango.

Installed openJDK by removing java, apt install openjdk8-dev. Installed java tools from https://people.debian.org/~picca/libtango-java_9.2.5a-1_all.deb .

Downloaded Tango 9.3.3 RC1.
mkdir build
cd build
../configure
sudo make -j4 install

Test 1:
Out of the box make fails per the exact same reason you specified and that is part of the pull request, zmq.hpp has some sort of incompatibility. Fails during cpp/server/zmqeventconsumer.cpp build.

Next attempt: git clone master from cppZMQ. built and installed.
Back to tango-9.3.3, rm -rf build, mkdir build, cd build, ../configure, sudo make -j4 install

Passes server build, fails here:

../../../../lib/cpp/client/zmqeventconsumer.cpp:4070:31: error: no matching function for call to 'zmq::socket_t::recv(zmq::message_t*)'
sender.recv(&reply);

Next attempt: git clone v4.3.1 from cppZMQ, libzmq. built and installed.
Back to tango-9.3.3, rm -rf build, mkdir build, cd build, ../configure, sudo make -j4 install

Passes server build, fails here:

../../../../lib/cpp/client/zmqeventconsumer.cpp:4070:31: error: no matching function for call to 'zmq::socket_t::recv(zmq::message_t*)'
sender.recv(&reply);



Next attempt: git clone -b v4.3.0 from cppZMQ, libzmq. built and installed.
Back to tango-9.3.3, rm -rf build, mkdir build, cd build, ../configure, sudo make -j4 install

This now works. User error. Now to recompile HDBES and HDBCM with new lib and see whether attributes still hang.
YASSSSSSS now things are working properly at least at the ES layer (I still have some performance issues with the device server itself but that's a different issue). Thank you for the help Reynald!

Sidebar: it seems like most here either run Debian or just use TangoBox, is there any value in me doing a PR for documentation on readthedocs for lubuntu/ubuntu likes to get a working install? Trying to get all the various jtango/pytango/itango/hdbes/hdbcm/hdbconfigurator/hdbviewer working can sometimes be a bit of a slog so I should probably update the docs.
Thank you very much for your feedback.
Contributions are always welcome! So please feel free to suggest changes to the documentation.
It will probably be useful for other users, or even for you, later smile

Please have a look at this page of the documentation to know how you could contribute to the documentation:
https://tango-controls.readthedocs.io/en/latest/tutorials-and-howtos/tutorials/documentation-workflow-tutorial.html

Kind regards,
Reynald
Rosenberg's Law: Software is easy to make, except when you want it to do something new.
Corollary: The only software that's worth making is software that does something new.
 
Register or login to create to post a reply.