[Home]

Summary:ASTERISK-24152: memory leak and segfault with in libpthread-2.17.so
Reporter:Yuriy Gorlichenko (ovoshlook)Labels:
Date Opened:2014-08-01 06:15:31Date Closed:2016-10-06 00:31:26
Priority:MajorRegression?
Status:Closed/CompleteComponents:Channels/chan_sip/General Resources/res_pjsip
Versions:12.4.0 Frequency of
Occurrence
Related
Issues:
is related toASTERISK-22945 [patch] Memory leaks in chan_sip.c with realtime peers
is related toASTERISK-24307 Unintentional memory retention in stringfields
Environment:12.4.0 Ubuntu 13.10Attachments:( 0) backtrace_bt_full.txt
( 1) backtrace.txt
( 2) memsummary_hour.txt
( 3) memsummary_start.txt
( 4) mmlog
( 5) valgrind.txt
Description:Hello. I have issue with asterisk 12.4.0 with memory leak.

I have installed asterisk 12.4.0 at Ubuntu 13.10 with realtime chan_sip and pjsip modules (at diffrent ports: chan_sip used as trunk channel and pjsip runs as client side).

I see at syslog that my system fails futh segfault:
kernel: [17857.689103] asterisk[23250]: segfault at 2cf ip 00007ff84a294164 sp 00007ff79aeb75a0 error 4 in libpthread-2.17.so[7ff84a28b000+17000]

Then i open "top" appliation and monitor memory that used by asterisk. I monitored it at 1.5 hour and all this time memory usage grows (number of calls at one time always 2-5). So at start I see 3,3% of usage and then after 1.3 hour I see 11,5% usage.

I recompiled my asterisk with MALLOC_DEBUG and use from asterisk CLI "memory show summary" at start and after hour of work.


Comments:By: Yuriy Gorlichenko (ovoshlook) 2014-08-01 06:17:21.129-0500

There are summary files of my issue.

By: Yuriy Gorlichenko (ovoshlook) 2014-08-04 04:23:43.388-0500

Attached backtrace

By: Yuriy Gorlichenko (ovoshlook) 2014-08-05 06:38:19.180-0500

Does My Issue read for asterisk developers? Today is the 5-th day with no answer....

By: Rusty Newton (rnewton) 2014-08-07 15:22:34.189-0500

Your issue is in queue to be looked at. There are currently 1,038 Asterisk issues on the tracker in varying non-closed states. Many of which require responses and developer activity. You will have to be patient with us. As soon as a developer is able to investigate the issue, then it will happen.

Thanks for posting the appropriate debug and traces. Is there a chance you could also run this system under [Valgrind|https://wiki.asterisk.org/wiki/display/AST/Valgrind] ?

Can you provide the mmlog file output from your current setup using MALLOC_DEBUG ? Both at the start and after an hour? [MALLOC_DEBUG|https://wiki.asterisk.org/wiki/display/AST/MALLOC_DEBUG+Compiler+Flag]

By: Rusty Newton (rnewton) 2014-08-07 15:25:24.991-0500

In addition to the previous requests, you might also provide
* packet capture (tcpdump or wireshark) of traffic received and sent by the system right at the time of crash.
* Asterisk verbose log (debug if it doesn't kill the system) up to the time of the crash (or at least showing a short time before). [Collecting Debug Information|https://wiki.asterisk.org/wiki/display/AST/Collecting+Debug+Information].

By: Rusty Newton (rnewton) 2014-08-07 15:32:00.217-0500

Another comment! Out of sheer curiosity. Why are you using the older channel driver for trunks and the newer channel driver for clients? Why not use res_pjsip for both?

By: Yuriy Gorlichenko (ovoshlook) 2014-08-08 00:30:37.071-0500

Thanks for reply!!!
I will provide Valgrind with mmkog and debug at this weekend

About 2 channels:

res_pjsip does not provide (as I see) option like "insecure", So I need for this option, and I found solution for myself - to use sip_chan.
I see more pluses of this architecture:
I have 2 different tables for clients and trunks.
It is comforable to use with REALTIME integration
But I should note that res_pjsip works with realtime better than sip_chan:

At chan_sip (dynamic realtime) I must restart system for adding new trunk (that issue I at many systems developed by me ) If i do "sip reload" my system can not register all of trunks

This issue was described by my colleague at this topic
ASTERISK-23542


By: Yuriy Gorlichenko (ovoshlook) 2014-08-08 01:37:16.757-0500

One else comment!

When asterisk fails down I see last message at console (I'onitored calls of my company and catch this moment)
Maybe it helps
taskprocessor.c:612 __allocate_taskprocessor: Unable to start taskprocessor listener for taskprocessor f0c0cc1a-61a2-441b-a4d2-df2bf5e676e6

When syslog prints message at header of this issue (about segfault)

By: Yuriy Gorlichenko (ovoshlook) 2014-08-12 00:19:23.815-0500

Sorry for long answer- do not have out of work window to reinstall system with MALLOC_DEBUG. try at week Do not close issue please.

By: Yuriy Gorlichenko (ovoshlook) 2014-08-13 00:13:59.748-0500

One More question before I will run system under Valgrind:

Must I run my system with MALLOC_DEBUG for taking logs with Valgrind or it does't matter how to run system with this test?



By: Richard Mudgett (rmudgett) 2014-08-13 09:58:16.682-0500

MALLOC_DEBUG and valgrind are not compatible with each other because they try to do the same thing.  MALLOC_DEBUG will prevent valgrind from finding invalid memory accesses.  Use one or the other.

By: Yuriy Gorlichenko (ovoshlook) 2014-08-13 13:22:13.413-0500

Ok. I've collected logs at valgrind.txt and mmlog. But mmlog doesn`t help I think, because it have only newsession writes (when astersik starts after fail by my starting script).



By: Yuriy Gorlichenko (ovoshlook) 2014-08-19 08:05:10.311-0500

Does my logs not enough for your feedback?

Let me know...

By: Matt Jordan (mjordan) 2014-08-20 17:05:42.528-0500

Unfortunately, valgrind wasn't able to process the number of threads you had on your system. The errors it did push out to the log were not memory leaks, so much as conditional jumps/moves in {{libmyodbc}}, {{libpj}}, or {{libspeex}}. So no, that did not show the memory leak.

Asterisk's MALLOC_DEBUG will only dump out the leaked items on exit if you tell it to do so via the CLI commands. There will be items leaked on exit; those are generally harmless (if annoying). A systemic memory leak would hopefully show up via that.

That being said, if valgrind isn't outputting it, it's doubtful that MALLOC_DEBUG will catch it.

You may be best off attempting to determine which call flow increases the memory leak, and then replicating that under valgrind.

By: Walter Doekes (wdoekes) 2014-08-21 04:34:53.439-0500

so.

{noformat}
ast_taskprocessor_get
...
       /* Create a new taskprocessor. Start by creating a default listener */
       pvt = default_listener_pvt_alloc();
       if (!pvt) {
               return NULL;
       }
       listener = ast_taskprocessor_listener_alloc(&default_listener_callbacks, pvt);
       if (!listener) {
               default_listener_pvt_destroy(pvt);
               return NULL;
       }

       p = __allocate_taskprocessor(name, listener);
{noformat}

pvt holds this
{noformat}
pvt->poll_thread = AST_PTHREADT_NULL
{noformat}

we continue
{noformat}
__allocate_taskprocessor
...
       if (p->listener->callbacks->start(p->listener)) {
               ast_log(LOG_ERROR, "Unable to start taskprocessor listener for taskprocessor %s\n", p->name);
               ast_taskprocessor_unreference(p);
               return NULL;
       }
{noformat}

calling ast_taskprocessor_unreference,
which calls listener_shutdown,
which calls default_listener_shutdown,
{noformat}
default_listener_shutdown
...
               res = pthread_join(pvt->poll_thread, NULL);
{noformat}

indeed, pvt->poll_thread == -1 == 18446744073709551615

that's your segfault

By: Walter Doekes (wdoekes) 2014-08-21 04:41:57.846-0500

further, this looks like an ao2_unref too many:

{noformat}
__allocate_taskprocessor
...
       if (!(p->name = ast_strdup(name))) {
               ao2_ref(p, -1);
               return NULL;
       }
{noformat}

but that could at most cause corruption first after memory is already full

By: Walter Doekes (wdoekes) 2014-08-21 04:50:35.297-0500

@Yuriy: does {{sip show objects}} show any increase?

It looks like it crashes when a thread allocation fails because it ran out of memory.

I think you're on the right track with the memory debug. Does it keep increasing after another hour?
You can look at individual allocations with {{memory show allocations chan_sip}} (or statis) to get more info about the sources that appear to leak.

By: Yuriy Gorlichenko (ovoshlook) 2014-08-25 06:31:05.299-0500

Sorry for long feedback. sip show objects shows all my trunks.

I will recompile asterisk today with memory debug. and will send memory show allocations chan_sip here.

By: Yuriy Gorlichenko (ovoshlook) 2014-08-26 07:37:01.048-0500

Can yuo help me to dunp mmlog?

I compile Asterisk with MALLOC_DEBUG flag (menuselect->compiller flags MALLOC_DEBUG), But mmlog the same as above (at attachemets)

Can not make something deeper

By: Yuriy Gorlichenko (ovoshlook) 2014-08-29 03:27:39.060-0500

So stilll cannot take good mmlog.... Only "New Session" at the file.... (I do at cli "ON" all posible memory tbacktrace and atexit triggers)...

Leaking  keep increasing after hours. When memory debug is "ON" - asterisk crashes faster and memory usage increases wfrom 10% to 35% when crashes (after 2- 2.5 hours).

Then I can run only memory allocations only for all allocations (sip_chan gives me 0 result)....

Attached result of allocations (2 files > 10 mb..Can not attach they)

https://drive.google.com/folderview?id=0B9x8E8_-rV1gNWZkd3ZQSzdYbnc&usp=sharing

By: Walter Doekes (wdoekes) 2014-08-29 04:04:44.232-0500

Before:
{noformat}
$ awk '{print $9}' Junk/memoryallocations | sort | uniq -c | sort -n | tail -n10
  1577 endpoints.c
  2097 chan_sip.c
  2123 pbx.c
  3557 media_index.c
  3578 taskprocessor.c
  3928 xmldoc.c
  6459 format_cap.c
  7294 stasis.c
  8204 astobj2.c
 50082 /usr/local/src/asterisk-12.5.0/include/asterisk/lock.h
{noformat}
After an hour:
{noformat}
$ awk '{print $9}' Junk/memoryallocations-hour  | sort | uniq -c | sort -n | tail -n10
  3932 xmldoc.c
  6529 stasis_message_router.c
  6659 utils.c
  6901 endpoints.c
 13270 chan_sip.c
 17330 taskprocessor.c
 17995 format_cap.c
 20371 astobj2.c
 36593 stasis.c
 85346 /usr/local/src/asterisk-12.5.0/include/asterisk/lock.h
{noformat}
You have now allocated more than:
{noformat}
$ n=0; awk '{print $1}' Junk/memoryallocations-hour   | while read x; do n=$((x+n)); echo $n; done | tail -n1
729543142  (700MB)
{noformat}
So the following increases can be seen:
{noformat}
chan_sip: x6.3
stasic: x5.0
taskprocessor: x4.8
format_cap: x2.7
lock: x1.7
{noformat}
For chan_sip, that concerns these allocations that weren't in {{Junk/memoryallocations}}:
{noformat}
$ awk '{if($9=="chan_sip.c"){print $5}}' Junk/memoryallocations-hour | sort | uniq -c | sort -n
## ...
    74 __sip_reliable_xmit()
    74 sip_st_alloc()
##     90 sip_register()
   148 copy_request()
   222 sip_alloc()
  3161 proxy_from_config()   # was 497
  9480 build_peer()  # was 1493
{noformat}

So, you are leaking peers.

Are you using realtime? And if so, what settings? (rtcaching?)

Next step would be to figure out which peers are leaking. And from there you could start to examine why they aren't cleaned up.

OPTIONS? MESSAGES? MWI/NOTIFY? REGISTER? (through register=> lines? or callbackexten?)

By: Yuriy Gorlichenko (ovoshlook) 2014-08-29 05:07:23.280-0500

I Guessed that leak at peers. So I  was right at my guess.
Yes. I use realtime with rtcaching are disabled. but with rtupdate=yes. I use callbackexten.

About Next step. How to catch what cind of messages aten't cleaned up?

At "sip set debug" (As you remember - I use sip debug only for trunks) I see only REGISTER messages and Answers 200 "OK" (sometimes 100 "trying" for one peer).

By: ibercom (ibercom) 2014-08-29 05:41:55.082-0500

Yuriy: Your memory leak may be related to ASTERISK-22945 and ASTERISK-23999.

You can see a possible patch / workaround in ASTERISK-22945. Try it.


By: ibercom (ibercom) 2014-09-09 03:31:30.655-0500

Yuriy: What's about your memory leaks ?

You can look at this bug ASTERISK-24307 too. You can research about this. If you find anything, comment it.


By: Yuriy Gorlichenko (ovoshlook) 2014-09-23 16:44:22.380-0500

Hello all. I patched my asterisk with patch / workaround in ASTERISK-22945. Same result still leaking...

One more thing - I try this patch at 12.5.0 too. Same.

By: Yuriy Gorlichenko (ovoshlook) 2014-09-29 07:42:08.848-0500

What about any ideas for resolve this issue?
I must say that after patch leaking of system now running slowly than system without patch.

after 5 hours runninig system memory using have from 2.3% to 11.2% without crashing.

By: Walter Doekes (wdoekes) 2014-09-29 08:43:54.669-0500

What does 'sip show objects' and 'sip show registry' say (after a couple of hours)?

I'm tempted to blame callbackexten, since that's relatively new in conjunction with realtime.



By: Yuriy Gorlichenko (ovoshlook) 2014-09-30 05:16:31.892-0500

Thanks for response. thatis ok new features hard to deploy withoun any issues

sip show registry
voice.telphin.com:5060                  N      XXXXXXX         1785 Registered           Tue, 30 Sep 2014 14:00:59

and other peers shows  something like that

sip show objects
name: 00047135?00047135:MKZuAyrP@voice.telphin.com/00047135
objflags: 0
refcount: 2

and other peers show like that too



By: Walter Doekes (wdoekes) 2014-09-30 05:45:25.766-0500

Did you try this patch?

https://reviewboard.asterisk.org/r/4031/



By: Yuriy Gorlichenko (ovoshlook) 2014-09-30 05:55:09.179-0500

No Doesn't see this. Today I will try to comment this string at chan_sip.c
I will write about the results in a few hours.

By: Yuriy Gorlichenko (ovoshlook) 2014-09-30 06:15:58.147-0500

I went to my chan_sip.c and saw that this patch already here

/* Startup regular pokes */
               if (!devstate_only && enablepoke)
{

                       /*sip_ref_peer(peer, "schedule qualify");*/

                       sip_poke_peer(peer, 0);
              }

So I can think that result the same...


By: ibercom (ibercom) 2014-09-30 08:13:43.547-0500

Walter:

Yuri said "What about any ideas for resolve this issue?
I must say that after patch leaking of system now running slowly than system without patch.
after 5 hours runninig system memory using have from 2.3% to 11.2% without crashing."

Additionally, he has other memory leaks.

What's about [ASTERISK-24307] ?

By: Yuriy Gorlichenko (ovoshlook) 2014-09-30 10:22:48.218-0500

I dont think I have another memory leaks as at ASTERISK-24307 I have no AMI. Only realtime sip_peers (100 or more). So I must say until I have 3 realtime peers, I have no this problem.

By: Yuriy Gorlichenko (ovoshlook) 2014-10-19 23:20:46.257-0500

Hello. does you have new information about my issue?

By: Matt Jordan (mjordan) 2014-11-03 16:42:43.659-0600

The patch for  ASTERISK-22945 has been merged and will be available in 12.7.0. If you'd like, please test with the release candidate now available (12.7.0-rc1).

By: Yuriy Gorlichenko (ovoshlook) 2014-11-05 03:57:37.926-0600

installed rc 12.7.0 30 minutes ago. Started working with 2.4% of memory usage, 3.8% now...

By: Walter Doekes (wdoekes) 2014-11-05 04:06:45.627-0600

Even though you don't use AMI, you could still be reloading peers.

Could you try this one anyway?

https://reviewboard.asterisk.org/r/4114/
https://reviewboard.asterisk.org/r/4114/diff/raw/

By: Corey Farrell (coreyfarrell) 2014-11-11 14:26:20.121-0600

If you are leaking peers [REF_DEBUG|https://wiki.asterisk.org/wiki/display/AST/Reference+Count+Debugging] would be useful.

By: Corey Farrell (coreyfarrell) 2016-10-06 00:31:26.208-0500

Per the Asterisk versions page [1], the maintenance (bug fix) support for the Asterisk branch you are using has ended. For continued maintenance support please move to a supported branch of Asterisk. After testing with a supported branch, if you find this problem has not been resolved, please open a new issue against the latest version of that Asterisk branch.

Thanks!

[1] https://wiki.asterisk.org/wiki/display/AST/Asterisk+Versions