Thread (Polling 1) memory performance

Hi All,

I am running a Java Tango DS. And analyzing Java application performance using tools like VisualVM, htop & Eclipse MAT.

Pain Point: Java Tango DS consumes approx. 1 GB of RAM while running

Some Artifacts:

Below is a finding for a particular thread named "Polling 1" that I am unable understand.

Source: Thread Dump
"Polling 1" #125 prio=5 os_prio=0 tid=0x00007fec0042b000 nid=0x8da runnable [0x00007feb7196b000]
   java.lang.Thread.State: RUNNABLE
	at java.net.SocketOutputStream.socketWrite0(Native Method)
	at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:111)
	at java.net.SocketOutputStream.write(SocketOutputStream.java:155)
	at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
	at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
	- locked <0x00000000872df570> (a java.io.BufferedOutputStream)
	at org.jacorb.orb.etf.StreamConnectionBase.flush(StreamConnectionBase.java:223)
	at org.jacorb.orb.giop.GIOPConnection.sendMessage(GIOPConnection.java:1088)
	at org.jacorb.orb.giop.GIOPConnection.sendRequest(GIOPConnection.java:1014)
	at org.jacorb.orb.giop.ClientConnection.sendRequest(ClientConnection.java:309)
	at org.jacorb.orb.giop.ClientConnection.sendRequest(ClientConnection.java:290)
	at org.jacorb.orb.Delegate._invoke_internal(Delegate.java:1371)
	at org.jacorb.orb.Delegate.invoke_internal(Delegate.java:1209)
	at org.jacorb.orb.Delegate.invoke(Delegate.java:1197)
	at org.omg.CORBA.portable.ObjectImpl._invoke(ObjectImpl.java:475)
	at fr.esrf.Tango._Device_5Stub.read_attributes_5(_Device_5Stub.java:1490)
	at fr.esrf.TangoApi.DeviceProxyDAODefaultImpl.read_attribute(DeviceProxyDAODefaultImpl.java:1392)
	at fr.esrf.TangoApi.DeviceProxyDAODefaultImpl.read_attribute(DeviceProxyDAODefaultImpl.java:1323)
	at fr.esrf.TangoApi.DeviceProxy.read_attribute(DeviceProxy.java:794)
	at org.tcs.ncra.gmrt.tango.org.tango.lmcds.LMCDS.getElDifference(Unknown Source)
	at sun.reflect.GeneratedMethodAccessor23.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.tango.server.attribute.ReflectAttributeBehavior.getValue(ReflectAttributeBehavior.java:85)
	at org.tango.server.attribute.AttributeImpl.updateValue(AttributeImpl.java:184)
	at org.tango.server.cache.AttributeCacheEntryFactory.createEntry(AttributeCacheEntryFactory.java:90)
	- locked <0x0000000084933168> (a org.tango.server.attribute.AttributeImpl)
	- locked <0x0000000084893ee0> (a java.lang.Object)
	at net.sf.ehcache.constructs.blocking.SelfPopulatingCache.refreshElement(SelfPopulatingCache.java:272)
	at net.sf.ehcache.constructs.blocking.SelfPopulatingCache.refresh(SelfPopulatingCache.java:159)
	at net.sf.ehcache.constructs.blocking.SelfPopulatingCache.refresh(SelfPopulatingCache.java:112)
	at org.tango.server.cache.CacheRefresher.run(CacheRefresher.java:57)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:748)

   Locked ownable synchronizers:
	- <0x0000000085a1d360> (a java.util.concurrent.ThreadPoolExecutor$Worker)
	- <0x00000000872df5a8> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)

Source: VisualVM CPU Sampler
Image: Thread Polling 1_CPU Sampler using VisualVM

Source: VisualVM Memory Sampler
Image: Thread Polling 1_Memory Sampler using VisualVM

Source: VisualVM Thread Status
The thread is in "park" state (brown color) most of the time.
Image: Thread Polling 1_Status using VisualVM


Source: Output of htop command
Image: htop Output

Need to understand the impact of this thread, Polling 1, on the memory consumption of the application.

Note: Polling 1 has maximum allocated bytes/sec when the application is running.
Regards,
TCS_GMRT
Edited 1 year ago
Hi Team,

Any update…
Regards,
TCS_GMRT
Hi Team,

Any help will be much appreciated.
Regards,
TCS_GMRT
Hi TCS_GMRT,

do you have more info like what you were polling and at what frequency. This will help us to try to reproduce it. Maybe a small device class and it configuration to run it here. We run Java device servers 24/7 without this issue (to my knowledge).

Andy
Hi Team,

Some recent statics about the polling thread. I am wondering why the polling 1 thread keeps on consuming memory.

I am still trying to debug the application to identify the cause.

Note: 1) The application was running idle (no commands were sent)for 63 hours, when this screen shot was taken.
2) The application currently has 325 attributes, mostly polled per second. The attribute count can go up-to 1000+ during application peak execution.
3) Of these 325, 4 are spectrum attribute (String datatype) & the remaining are scalar / dynamically created scalar attributes.

Any inputs will be useful.
Regards,
TCS_GMRT
Edited 5 months ago
Hi,

So, if I understand well, you have a Java device device server, which is creating dynamically mostly scalar attributes and 4 DevString spectrum attributes (what is the size of these spectrum attributes?).

The number of attributes can vary from 325 to 1000.
They are all polled at 1000ms period rate.

This means that when you have this peak period where you have 1000 attributes in your device server, the Java device server polling thread will have in average only 1ms to read each attribute and it looks like the polling thread is the thread consuming a lot of CPU and memory. I'm not completely surprised by that.

The thread polling 1 is simply a thread in the Java device server which is dedicated to read all your attributes which are configured to be polled once every polling period (1s in you case). So it will execute the getValue() method from your dynamic attributes classes.
Maybe you should have a look at what is done in this getValue() method to see whether this is not allocating unnecessary memory?

The idea of the polling thread, in case you didn't understand it yet, is to read your attributes every 1s (as you configured them). It will put the values which where read in a polling ring buffer. If a client tries to read the attribute, by default, the device server will have a look into its polling buffer and return the last cached value which was read for this attribute by the polling buffer without having to execute the getValue() method from your dynamic attribute class again. In this configuration, the getValue() methods are invoked only by the polling thread, whatever the number of clients.

The polling thread is necessary to send events too (when they are not pushed by code). Tango will take care of sending events when necessary (depending on your attribute event configuration (change/archive event thresholds/period)) after each getValue() invocation by the polling thread.

The polling buffer has a depth too. I think by default, this Poll ring depth is set to 10. This means that, by default, for each attribute, the polling thread will store the last 10 values in a ring buffer. So in your case, in the best case, the polling thread will store 10 * 325 attribute values, with their timestamps, attribute quality factor and maybe exceptions.
There is the possibility for some Tango clients to read the last n values stored by the polling buffer. Atkmoni (AtkPanel, View menu -> Numeric & State Trend), is doing that for instance to retrieve some values from the past to display the first values of the trend.
This poll ring depth is configurable with jive (See attached screenshot) and could be set to a bigger value than 10. We have some use cases at the ESRF where we store the last hour in the polling buffer for example (attribute polling period = 1000 ms and poll ring depth = 3600 for instance).

I hope these explanations help you to better understand the role of the polling thread.

Kind regards,
Reynald
Rosenberg's Law: Software is easy to make, except when you want it to do something new.
Corollary: The only software that's worth making is software that does something new.
Edited 5 months ago
Hi Reynald,

Thanks for the prompt help on the Polling Thread details.

Would also like to understand, if there is a way to split the number of attributes polled by a polling thread.
I mean to say, can we have, say 100 attributes, polled by polling thread-1.
next 100 attributes polled by polling thread-2. … and so on. So finally there will be 10 polling threads, polling 1000 attributes during application's peak execution.
Regards,
TCS_GMRT
Well, yes and no. TANGO supports the polling thread pool concept per device server instance, thus having a pool of polling threads polling all the devices belonging to the same server instance. Which seeme not to be your case, with a single device publishing up to 1000 attributes. Moreover, even splitting the load over several threads may not solve your problem, since you still have to cope with 1ms max allowed execution time for each method to keep up with the specified polling setup.
Cheers,
Lorenzo
We want to understand it little more. How does TANGO treat the attributes defined with different polling period for a given DS? For example, if I have 10 attributes to be polled at 1000ms and 10 attributes to be polled at 5000ms. Will still the polling 1 thread alone will take care of all 20 attributes polling?

Does it has different polling threads to take care of polling of the attributes whose value varies at a faster rate, with medium rate and slower rate? And the fast, medium and slower can be defined with polling period as 0- 1000ms; 1000 - 10000ms and greater than 10000ms respectively.
Regards,
TCS_GMRT
TCS_GMRT
We want to understand it little more. How does TANGO treat the attributes defined with different polling period for a given DS? For example, if I have 10 attributes to be polled at 1000ms and 10 attributes to be polled at 5000ms. Will still the polling 1 thread alone will take care of all 20 attributes polling?

Yes

TCS_GMRT
Does it has different polling threads to take care of polling of the attributes whose value varies at a faster rate, with medium rate and slower rate?

No
Rosenberg's Law: Software is easy to make, except when you want it to do something new.
Corollary: The only software that's worth making is software that does something new.
 
Register or login to create to post a reply.