Best practices for automation

Hej,

I am new to Tango but have some experience on other control systems in semiconductor automation. I am building a sort of plasma torch. I have played around with tango device servers and have been able to do what I would define as the low level IO for my machine, and in looking at Sardana, have been able to define macros to execute command series; you know, basically almost everything one would want out of a control system! :) however, one thing I have not figured out is what is the intended Tango way for enforcing automation. Let me explain what I mean:

Let’s take the case of my plasma torch. I have a low level safety plc designed to prevent certain personnel safety issues, like “hey, when hydrogen gas flowing, can’t open the oxygen valve” along with the codicil “hey, if the oxygen valve shows open, shut the hydrogen flow immediately”. These are both good things because quartz is frangible, and it makes sense that I implement these as fast safety plc hardware. But there are other things that I might want to do that are either beyond the scope of a plc or are not strictly safety related. A good example might be “hey, if reflected power in the rf source goes above $parameter, the plasma is probably out, try to relight automatically, and if after $parameter2 seconds the plasma isn’t relit, shut down the rf generator”.

For this case, what is the intended Tango way for something to sit and perform this work automatically?

The way I have approached this presently is to write a tango client Interlock-automation where it does deviceproxy to each specific device that needs to be pulled into the interlock and automation routine, then goes into an infinite loop with a delay that checks each of the interlock conditions every $parameter3 seconds and if a condition is met, does the correct behaviour. That said, this seems…hacky? I can’t call Sardana macros from this context so in the case of the “relight the plasma” I can’t just use my existing macro to do so and end up reimplementing the macro in native tango instead.

Further, because I am just naively evaluating based on time instead of on bit changes or events, it’s really inefficient in terms of overhead on tango. I know how I can refactor this to use events but it would still be cumbersome.

Finally, what really stinks about this approach is that my interlock behavior is now intrinsically hardcoded to devices that may or may not exist, and isn’t very hardened to stupid. For example, in this way, if the generator device server breaks, my interlock client will puke, which means one of the other interlocks it is checking is no longer being serviced.

The other way I considered to do this was to write the interlock behavior at the device server level but that seems backwards to have the device server trying to evaluate directly other device servers, even though it avoids some of the above issues.

Any thoughts on this?

Cheers,
Mark
Hi Mark,

Welcome to the Tango world! A sort of plasma torch? That sounds interesting!
In the Tango world, it is pretty common to have a device server being a client of other Tango devices and doing some evaluations, computations depending on the states/data coming from other Tango devices.
If I were you, I would use a dedicated Tango device to do this evaluation job and to trigger the appropriate commands/macros depending on the result of the evaluation.

The infinite loop you were describing could be implemented using the Tango polling of a dedicated attribute or command in this new Tango device (e.g.: a command Evaluate could be executed every x seconds). It could also be done in a dedicated thread.
This Tango device could use the Tango events too to be notified only when something relevant changes in the other devices.

After the evaluation, this new Tango device could send a command to a Tango macro server device to execute a specific macro or spawn a thread and execute the macro sequence in this thread.

This new Tango device could even be exported by the same device server (same process) as your other Tango devices if you wish to (a Tango device server can export several devices from several Tango classes).

I personally prefer to implement this kind of behaviour in a Tango device server than in a client because we have tools to monitor the device servers and to ensure they are correctly running. The Starter device server can also be configured to automatically restart a crashed device server under certain conditions (in our control system for instance device servers are automatically restarted if they crash and if they were running since more than 2 hours).

The inconvenient to implement the automation routine in a client is you need to ensure the uniqueness of this client. With a device server, it is easy, you cannot run twice the same device server instance. With a client, you could be into trouble if they are several of them running at the same time.
Where I work, we try as much as possible to put the logic at the device server level and to have clients as simple as possible.
Another advantage of using a Tango device to do this evaluation work is that you could be notified if there is something wrong with the underlying devices. This new device could have its state reflecting the state of the other sub-devices (ON = everything OK, ALARM = at least one sub-device is in ALARM, UNKNOWN=at least one sub-device is not responding, etc…). You can implement your own logic here.
This new device can then be used with a TANGO alarm system to trigger some alarms if its state is not ON for instance and you could be notified if there is something wrong with this evaluation process.

Hoping this helps a bit,
Kind regards,
Reynald
Rosenberg's Law: Software is easy to make, except when you want it to do something new.
Corollary: The only software that's worth making is software that does something new.
Hi Mark,

as Reynald says your applications sounds very interesting! I agree with Reynald on all the points he made concerning preferring implementing the automation loop in a device server rather than the client. The automation device server is also a client of other device servers so the main difference is that it is encapsulated in a device server, runs only once, can be monitored and managed like other device servers.

An example of how at the ESRF we have used this approach is the procedure for topping up the e-beam every few minutes. The sequences to top up the e-beam are written in Python and executed via the Sardana Macro device server.

You mention that the client is heavy or exposed to errors from devices. This should not be an issue. If you use Tango events to receive updates from your lower level devices then you are informed immediately when something changes and your automation code can take action. If one or more devices are in error you need to take this into account in your code to either ignore it and/or protect the other devices. Another way of reading multiple devices in parallel is to use asynchronous calls. These are fired off to the device server(s) while the client continues immediately. Once all devices have been contacted the client code can synchronise with the answers and set a timeout to not wait longer than a fixed time. For example you could asynchronously read all devices and then set a timeout of N milliseconds before continuing. The devices that have not answered yet are marked as not responding and need to be treated accordingly in your automation algorithm.

Which programming language are you planning on writing your code in?

There is an example of doing of using asynchronous commands in PyTango on this page:

https://tango-controls.readthedocs.io/en/latest/tutorials-and-howtos/how-tos/how-to-pytango.html

You can see an example of a python device server with a worker thread here :

https://github.com/andygotz/nap

Thanks to Tiago for the recipe!

Cheers

Andy

Meh, it's just an ICP source with some novelty on the process and robot side. I've worked on ion implanters in a past life and wanted to start with a control system that can do some of the things that I'm used to.

Thank you for the advice! I've ported my client code over to a device server and used the idea of attributes to set the device names which is a really clever way to do it. I'm also going to try to move everything over to events instead of just polling heavily like I am now. In reality my device load is so low that it probably doesn't matter, but might as well do things right, eh?

Everything is in python as I want to be able to bring persons onto the project as it grows and to be honest, the majority of the work will be process work, thus non computer scientists/software engineers working on it. All the physicists I know prefer python, so python it is.

One thing that I think that I didn't really grasp in Tango that I'm now starting to: a device server isn't just a controls endpoint like it is on an ethercat bus, or IOC like it is in EPICS, it's really any distributed controller that has the option of going to lower level IO. Neat to know!

Thanks guys,
Mark
Wanted to come back to this: a big thank you to everyone for helping. I've implemented interlocking as device servers and that seems to work pretty well. One of the things that I realized can happen is that PANIC can handle some of the more mundane interlock functionality (things like, "if you get a pressure burst, close the isolation valves so you can try to save the turbopump") but PANIC falls down when trying to implement some of the more complex interlocking on a system. PANIC also is kind of backwards for the use case of PANIC prevention. it calls out the AAAAAAAAAAAAA after it's happening, but doesn't provide a stupidity prevention system: I haven't figured out a way to, in PANIC, prevent me from opening the isolation valve between my fully spun up turbopump and a chamber at atmosphere.

This is a long-winded way of getting to my question: does anyone else in the ecosystem think there's value in extending PANIC to do command prevention? I'm leading the witness a bit here since I'm positive that this sort of functionality has to be realized at other sites somehow, and maybe I'm just not seeing the right tool.
Edited 5 years ago
Hi Amato,
The PANIC is not meant to work as this kind of prevention tool. Usually, this functionality is provided by dedicated interlock systems based on PLC (Programmable Logic Controllers, https://en.wikipedia.org/wiki/Programmable_logic_controller) or dedicated electronics that work close to the hardware.

Best regards,
Piotr
Understood, ish, but let me provide context, as I've done more than a few PLC systems in my life.

I used to work at Varian/Applied Materials doing ion implanters, the low energy, dirty, toxic, industrial version of the high art of accelerators most places running TANGO know. There, the control system was hilariously similar to a homebrew version of EPICS from 1994 and was a polyglot of code from either ASM, Diamond Semiconductor Group, and Genus. It had three separate layers of interlocking:

- there was a safety PLC or other safety relaying preventing things that could cause personnel danger (like dumping an arsene bottle into an open chamber at atmosphere)
- there was a PLC or other machine safety relaying preventing things that could cause machine damage (like opening a valve that would shred a turbo)
- there was software interlocking for more complex things that would usually damage the wafer or machine (like "only allow motion of a robot in a particular direction based on the status of other robots, the status of that valve and that turbo, and the state of the arsene bottle being at a pressure greater than 5 torr for more than 20 minutes").

I accept that the last thing could be accomplished in a PLC that has more smarts than a simple PLC, but once you start pushing some of those automation smarts to the PLC, the plc gets more expensive, gets more bogged down in tasking, the PLC code gets very complex, and reduces a lot of your flexibility to make a cogent R&D system or product. Varian/Applied solved this by taking an approach of "these things we are cool with letting be in software as they solely represent machine danger not personnel danger" and it allowed us to turn around machines quicker and at lower cost since the IO was already in the higher level control system and was effectively just using free cycles, versus having to have some sort of distributed interlocking scheme sitting at the PLC layer which then would connect back to the higher level software.

Tango can absolutely approximate this behavior, either by having something like an interlock device server to which user processes connect and from which passthrough attributes can be made to the lower level device servers servicing the hardware, or by using the concept of a flag at each device server for each command that only is able to be set by an interlock device server, and that resets to a safe state if the interlock device server times out or isn't running.

The second paradigm seems more Tango like, and I guess could even be used by PANIC if one coded PANIC vars to trigger the interlock command at each device server, and handle them as events in PANIC.

In talking to the guys at ANSTO (granted, an EPICS facility) they basically eschew the concept of any higher level software interlocking and handle it all in the PLC, which, again, that is fine, but that also means that their EPICS instance is either replicating existing automation at the PLC level and the PLC blocks some of it, or their EPICS is basically a glorified matlab terminal reading off values from the PLC.

Does anyone else use a TANGO process to do behaviours like this?
Edited 5 years ago
Hi Amato,
well, as far as I know… no, mainly because of design choices, e.g. you do not rely on "high-level" software for machine protection (nor for personnel safety, by regulations). However, what you're describing is feasible, but I'd keep the interlocking mechanism separate from the alarm reporting system. You may just want to provide an additional layer (FormulaConf device may help) on top of devices servicing the hardware, implementing the interlock logic, which will enable or disable specific methods on the latter.
But, be clear, this is going to be a software-based machine protection…
Cheers,
Lorenzo
Edited 5 years ago
 
Register or login to create to post a reply.