segmentation fault : “Source file is more recent than executable”

Hello all,

I am using PyTango 9.2.2. I have observed segmentation fault with below logs with GDB python:


[Switching to Thread 0x7fffa37fe700 (LWP 6658)]
std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::assign (__s=0x7fff980cb8e0 "Not specified", this=0x0)
at /usr/include/c++/5/bits/basic_string.h:1167
warning: Source file is more recent than executable.
1167 traits_type::length(__s));



It seems to have issue with some c++ string api used by PyTango or taurus.

Please let me know if how to can get more information for this bug which can help us to debugging it.

Thanks,
Hitesh Patel
Regards,
TCS_GMRT
Edited 1 month ago
If I were you I would use the gdb commands to get more informations about where the process crashed.
Please refer to https://wiki.python.org/moin/DebuggingWithGdb for instance if you want to know the commands you can execute on gdb to get more details.

A quick look a this page made me conclude that the results of the following gdb commands might help you:

bt

and
py-bt

py-list
might also help you.
You need to have gdb on your system and Python debugging extensions to use the python-specific gdb commands.
The web page I mentioned before (https://wiki.python.org/moin/DebuggingWithGdb) provides instructions on how to install these extensions.

I've never used gdb with Python so far so some other more experimented persons might help you better than me…
Rosenberg's Law: Software is easy to make, except when you want it to do something new.
Corollary: The only software that's worth making is software that does something new.
Thank you Reynald for debugging step.

I have tried with py-bt and bt for Python and C to get stack trace respectively.

I got below stack trace. Can anybody help me to resolve this issue.


Thread 14 "python" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffa37fe700 (LWP 30705)]
std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::assign (__s=0x7fff98061880 "Not specified", this=0x0)
at /usr/include/c++/5/bits/basic_string.h:1167
warning: Source file is more recent than executable.
1167 traits_type::length(__s));



(gdb) py-bt
Traceback (most recent call first):
(gdb)
Traceback (most recent call first):


(gdb) bt
#0 std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::assign (__s=0x7fff98061880 "Not specified", this=0x0)
at /usr/include/c++/5/bits/basic_string.h:1167
#1 std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::operator= (__s=<optimized out>, this=<optimized out>)
at /usr/include/c++/5/bits/basic_string.h:559
#2 Tango::_AttributeInfoEx::operator= (this=this@entry=0x7fff98061a20, att_5=att_5@entry=0x21ae8f0) at api_util.cpp:1852
#3 0x00007ffff2131a56 in Tango::ZmqEventConsumer::push_zmq_event (this=this@entry=0x21ae0b0,
ev_name="tango://cmsserver1:10000/mnc/cmc/agn6/agnmncstatus.idl5_attr_conf", endian=endian@entry=0 '\000', event_data=…, error=<optimized out>,
ds_ctr=@0x7fff9805f854: 0) at zmqeventconsumer.cpp:2287
#4 0x00007ffff2132c63 in Tango::ZmqEventConsumer::process_event (this=this@entry=0x21ae0b0, received_event_name=…, received_endian=…,
received_call=…, event_data=…) at zmqeventconsumer.cpp:589
#5 0x00007ffff2133f12 in Tango::ZmqEventConsumer::run_undetached (this=0x21ae0b0, arg=<optimized out>) at zmqeventconsumer.cpp:320
#6 0x00007ffff11637e1 in omni_thread_wrapper () from /usr/local/lib/libomnithread.so.4
#7 0x00007ffff7bc16ba in start_thread (arg=0x7fffa37fe700) at pthread_create.c:333
#8 0x00007ffff78f741d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109




Does this issue is with bits/basic_string.h:1167 version or anything with c library version?

Thanks,
Hitesh Patel
Regards,
TCS_GMRT
Hi Hitesh,

It looks like you bumped into https://sourceforge.net/p/tango-cs/bugs/828/ bug.
This bug is fixed in Tango 9.2.5a.

Cheers,
Reynald




Rosenberg's Law: Software is easy to make, except when you want it to do something new.
Corollary: The only software that's worth making is software that does something new.
Hi Reynald,

I have observed core dump with Tango 9.2.5a and omniORB-4.2.1, zeromq-4.0.7.

Below is the C traceback:

(gdb) bt
#0 0x00007ffff7825428 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
#1 0x00007ffff782702a in __GI_abort () at abort.c:89
#2 0x00007ffff1c8f84d in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#3 0x00007ffff1c8d6b6 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#4 0x00007ffff1c8d701 in std::terminate() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#5 0x00007ffff1c8d919 in __cxa_throw () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#6 0x00007fffdef25064 in Tango::Except::throw_exception (reason=<optimized out>, desc=<optimized out>, origin=<optimized out>,
sever=<optimized out>) at /usr/local/include/tango/except.h:135
#7 0x00007fffde4fc60d in Tango::TangoMonitor::get_monitor (this=0x25f7190) at ../../../lib/cpp/server/tango_monitor.h:150
#8 0x00007fffde51694d in Tango::DelayEvent::DelayEvent (this=0x7fffb144dd30, ec=<optimized out>) at zmqeventconsumer.cpp:3792
#9 0x00007fffde506ba9 in Tango::EventConsumerKeepAliveThread::run_undetached (this=0x2631200, arg=<optimized out>)
at eventkeepalive.cpp:573
#10 0x00007fffdd5517e1 in omni_thread_wrapper () from /usr/local/lib/libomnithread.so.4
#11 0x00007ffff7bc16ba in start_thread (arg=0x7fffb144e700) at pthread_create.c:333
#12 0x00007ffff78f741d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109


Python traceback:

(gdb) py-bt
Traceback (most recent call first):
(gdb) py-list
Unable to locate gdb frame for python bytecode interpreter


Any help would be greatly appreciated..!!

Thanks,
Hitesh Patel
Regards,
TCS_GMRT
Hi Hitesh,

Did you experience this crash only once or is it something easy to reproduce?
Could you please reproduce this in a simple code sample and provide us the source code?
This would greatly help us to debug this issue.

Kind regards,
Reynald
Rosenberg's Law: Software is easy to make, except when you want it to do something new.
Corollary: The only software that's worth making is software that does something new.
Hi Reynald,

Thanks for your reply.

I have experienced this frequently. As per python GDB logs it looks like issue with some interface change event

(gdb) py-bt
Traceback (most recent call first):
(gdb) py-list
Unable to locate gdb frame for python bytecode interpreter
(gdb) bt
#0 0x00007fffde51d6f7 in Tango::ZmqEventConsumer::push_zmq_event (this=this@entry=0x251f470,
ev_name="tango://01hw499468:10000/mnc/cmc/agn1.intr_change", endian=endian@entry=0 '\000', event_data=…, error=<optimized out>,
ds_ctr=<optimized out>) at zmqeventconsumer.cpp:2816
#1 0x00007fffde521ae3 in Tango::ZmqEventConsumer::process_event (this=this@entry=0x251f470, received_event_name=…, received_endian=…,
received_call=…, event_data=…) at zmqeventconsumer.cpp:594
#2 0x00007fffde522d92 in Tango::ZmqEventConsumer::run_undetached (this=0x251f470, arg=<optimized out>) at zmqeventconsumer.cpp:320
#3 0x00007fffdd5517e1 in omni_thread_wrapper () from /usr/local/lib/libomnithread.so.4
#4 0x00007ffff7bc16ba in start_thread (arg=0x7fffc5278700) at pthread_create.c:333
#5 0x00007ffff78f741d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

I will try to reproduce it with simple source code. In our application we are used to create around 200+ client proxy and subscribe interface change event for them so simple code may not have same behavior in terms of memory and CPU usage.
Regards,
TCS_GMRT
Hi Raynald,

Please go through some more debug logs for C trace when core dump occurred which may help you to guide us.

omniORB: From endpoint: giop:tcp:192.168.70.2:39940. Detected GIOP 1.2 protocol error in input message. giopImpl12.cc:409. Connection is closed.

Thread 40 "python" received signal SIGPIPE, Broken pipe.

[Switching to Thread 0x7fff6f7fe700 (LWP 8552)]
0x00007ffff7bca9ff in __libc_send (fd=240, buf=0x7fffa4524218, n=112, flags=0) at ../sysdeps/unix/sysv/linux/x86_64/send.c:26
26 ../sysdeps/unix/sysv/linux/x86_64/send.c: No such file or directory.
(gdb) bt
#0 0x00007ffff7bca9ff in __libc_send (fd=240, buf=0x7fffa4524218, n=112, flags=0) at ../sysdeps/unix/sysv/linux/x86_64/send.c:26
#1 0x00007ffff148da70 in omni::unixConnection::Send(void*, unsigned long, unsigned long, unsigned long) () from /usr/lib/libomniORB4.so.1
#2 0x00007ffff144dca1 in omni::giopStream::sendChunk(omni::giopStream_Buffer*) () from /usr/lib/libomniORB4.so.1
#3 0x00007ffff1461923 in omni::giopImpl12::outputMessageEnd(omni::giopStream*) () from /usr/lib/libomniORB4.so.1
#4 0x00007ffff1452f87 in omni::GIOP_C::InitialiseRequest() () from /usr/lib/libomniORB4.so.1
#5 0x00007ffff1438aea in omniRemoteIdentity::dispatch(omniCallDescriptor&) () from /usr/lib/libomniORB4.so.1
#6 0x00007ffff141fa65 in omniObjRef::_invoke(omniCallDescriptor&, bool) () from /usr/lib/libomniORB4.so.1
#7 0x00007ffff101105b in omni::RequestImpl::deferred_invoke() () from /usr/lib/libomniDynamic4.so.1
#8 0x00007ffff0fd8bc4 in omni::DeferredRequest::execute() () from /usr/lib/libomniDynamic4.so.1
#9 0x00007ffff140556d in omniAsyncWorkerInfo::run() () from /usr/lib/libomniORB4.so.1
#10 0x00007ffff1405bff in omniAsyncWorker::run(void*) () from /usr/lib/libomniORB4.so.1
#11 0x00007ffff0207779 in omni_thread_wrapper () from /usr/lib/libomnithread.so.3
#12 0x00007ffff7bc16ba in start_thread (arg=0x7fff6f7fe700) at pthread_create.c:333
#13 0x00007ffff78f741d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109


Can you please explain what is meant by Detected GIOP 1.2 protocol error in input message. giopImpl12.cc:409. Connection is closed.


Thanks,
Hitesh Patel
Regards,
TCS_GMRT
Hi Hitesh,

It looks like you encountered many different problems.
All your backtraces are different.
It is very difficult to guess what are the origins of your problems from what you provided.
It would be great to have an easy way to reproduce the crashes your experimented so we could debug the problems more easily.
I would add that the last crash is very suspicious since only omniORB methods are listed in the backtrace your provided so it could be an omniORB bug or something wrong in your device server or in the Tango library which would corrupt the memory.

Kind regards,
Reynald
Rosenberg's Law: Software is easy to make, except when you want it to do something new.
Corollary: The only software that's worth making is software that does something new.
 
Register or login to create to post a reply.