sys-tango-benchmark results

Hi All

SKA is interested in benchmarking performance of a large TANGO system in our environment. The https://github.com/tango-controls/sys-tango-benchmark tool looks very useful in this regard. Does anyone have some results from previous runs that they'd be willing to share? We're looking at 10k to 100k devices, but even smaller systems will be interesting for comparison. Maybe this is already available somewhere online?

Regards,
Anton
Hi Anton,

You are asking for it just in time smile. We are now preparing an ask for Institutes to run some unified set of benchmarks to get results.
Up to now, we were running it only on the local virtual machines for test purposes.

Beginning next week we will provide a proposition of the configuration .yml file.

All the best,
Piotr
Hi Anton,

this sounds like a very interesting use case and as Piotr pointed out this arrives just when Piotr is requesting benchmarks from all sites. We need these results before ICALEPCS.

In your case I wonder what kind of metrics are you planning to measure? I can imagine measuring performance as a function of number of clients per server but in the case of many devices what values do you want to measure - startup times, grouped calls, individual client accessing 1 or more device servers, events performance? As you know the Tango model implements point-2-point connections between clients and servers. Multiplying the number of device servers does not necessarily impact the performance of individual client-server connections. Are you planning on putting 10k devices in one device server? Or are you looking to optimise the number of devices per device server?

Andy
Hi Andy, Piotr

Thanks for replies. Glad to hear some tests and reports are planned. Andy, you make a good point about the point-to-point communications, which should remain very efficient.

We're thinking of looking at metrics like these:
- Start / initialisation time.
- Peak memory usage .
- Peak CPU usage.
- Some measure of query response time for attributes & commands & events, to N devices concurrently.
- How many devices can we run on a VM with say 1 CPU and 4 GB RAM. Does doubling the resources allow twice as many devices?
- Possibly the TANGO DB registration time (first time population of the DB with all devices, attributes, properties), although this isn't a recurring cost, so not that important.

In our environment everything is Dockerised. The plan is to use Kubernetes to orchestrate the TANGO control system. Early tests have shown that 1 device per device server per container doesn't scale very well - e.g. problem starting 2000 on a single machine. Multiple devices per server works better, with maybe 100 containers on a machine. We are looking at how to spread the load out, giving guidelines for developers. Questions like:
- How many devices per device server?
- How many device servers per container?
- How many containers per VM?
- How much CPU and RAM per VM?
Obviously, it depends what each device is doing, but we'd start with something simple.

Anton
Hi Anton,

You can start with a kind of standard tests (prepared by Michal) to be able to compare results from different institutes.
See: https://github.com/tango-controls/sys-tango-benchmark-standard-tests

Regarding already available benchmarks, there are measurements of:
  • response time for attributes, pipes and commands, events subscriptions
  • dynamic attributes impact on memory consumption
  • howe the start time is influenced by the number of devices within device servers

Feel free to propose or (event better smile ) to write additional tests.

Piotr
Thanks, Piotr - we'll take a look.
Hi,

Back to initial Anton's question - Are there any results available already? Are they published somewhere?

Thanks!

Cheers,
Edited 4 years ago
@Ingvord, there are at least results of tests made on AWS, for ICALEPCS paper: https://github.com/tango-controls/sys-tango-benchmark-standard-tests/tree/master/aws-ec2-tests

All the best, Piotr
Hi Piotr,

Thanks a lot!

That is already interesting to see!

Cheers,
I have looked through the tests result and it seems to me that Java in unfairly slow.

First of all some question to test benchmark itself:

  1. was Java server warmed up before measurement? Was it at least started with -server flag?
  2. Do I understand correctly that there were a number of AWS instance per client?
  3. Was any tuning done before running the tests, like setting jacORB thread pool etc?

Sorry if these questions have been answered somewhere - I could not find.

I have extracted test server from the benchmark and wrote a simple test here

Running the test for 15s with 64 clients (all on a single machine though) gave me 124371 from WriteAttributeCounterCount, while amazon results are typically 6oK (x2 times slower)

Anyway I have started to investigate this, you can track the progress here
Edited 4 years ago
 
Register or login to create to post a reply.