[Home]

Summary:ASTERISK-27981: res_fax: Fax session leak with fax gatewaying
Reporter:pasandev (pasandev)Labels:fax patch pjsip
Date Opened:2018-07-23 00:22:32Date Closed:2018-08-30 05:05:04
Priority:MajorRegression?
Status:Closed/CompleteComponents:Resources/res_fax
Versions:13.21.1 Frequency of
Occurrence
Constant
Related
Issues:
is duplicated byASTERISK-28450 Program terminated with signal 11, Segmentation fault at t38_gateway.c:2189
Environment:OS: Debian Stretch 9.4 CPU Details: Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 6 On-line CPU(s) list: 0-5 Thread(s) per core: 1 Core(s) per socket: 6 Socket(s): 1 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 44 Model name: Westmere E56xx/L56xx/X56xx (Nehalem-C) Stepping: 1 CPU MHz: 2933.612 BogoMIPS: 5867.22 Hypervisor vendor: KVM Virtualization type: full L1d cache: 32K L1i cache: 32K L2 cache: 4096K L3 cache: 16384K NUMA node0 CPU(s): 0-5 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology pni pclmulqdq ssse3 cx16 pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes hypervisor lahf_lm kaiser tsc_adjust arat Attachments:( 0) ASTERISK-27981.tar.gz
( 1) ASTERISK-27981-2.tar.gz
( 2) console
( 3) extensions.conf
( 4) fax-error-reference.diff
( 5) mem_alloc
( 6) mem_alloc240718
( 7) mem_snap0822
( 8) mem_snap0822_2
( 9) mem_snap0823
(10) mem_sum
(11) mem_sum240718
(12) messages.xz
(13) mmlog
(14) module_show
(15) pjsip.conf
Description:possible memory leak on res_fax_spandsp.c according to "memory show summary" command, mem allocated to res_fax_spandsp.c keeps growing.

I have a simple gateway setup to terminate calls from/to supplier using PJSIP Channel driver. Currently handling supplier -> Asterisk -> Customer calls.

Customer <> Asterisk <> Supplier
Comments:By: Asterisk Team (asteriskteam) 2018-07-23 00:22:35.388-0500

Thanks for creating a report! The issue has entered the triage process. That means the issue will wait in this status until a Bug Marshal has an opportunity to review the issue. Once the issue has been reviewed you will receive comments regarding the next steps towards resolution.

A good first step is for you to review the [Asterisk Issue Guidelines|https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines] if you haven't already. The guidelines detail what is expected from an Asterisk issue report.

Then, if you are submitting a patch, please review the [Patch Contribution Process|https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process].

By: pasandev (pasandev) 2018-07-23 00:24:47.983-0500

memory show allocations command output

By: pasandev (pasandev) 2018-07-23 00:25:54.286-0500

memory show summary command output

By: pasandev (pasandev) 2018-07-23 00:46:14.194-0500

/var/log/asterisk/mmlog file

By: Joshua C. Colp (jcolp) 2018-07-23 04:34:44.258-0500

What is actually being experienced? Is it running out of memory? What is the console log? How many calls are in progress?

By: pasandev (pasandev) 2018-07-23 04:49:21.656-0500

Hi Joshua,
The memory usage keep growing on the server. We have rrd base graph monitoring and it indicates the memory growth.
I took a 'core show calls' snapshot of the server right now and its as below,

15 active calls
60637 calls processed

what bit of console log do you want ?

By: Joshua C. Colp (jcolp) 2018-07-23 04:59:19.563-0500

A flow to show a normal call as it goes through the system just to confirm things.

As well is the log from that same system? What is the output of "module show"?

By: pasandev (pasandev) 2018-07-23 05:17:06.820-0500

HI Jousha,
yup, logs are from same system.
I have upload the module show output with module_show file. Also attach the console output after removing sensitive data.
This system handles both fax/voice calls

By: pasandev (pasandev) 2018-07-23 22:01:42.998-0500

Hi Josh,

I have attached memory summary and memory allocations command output for today in files mem_sum240718  and mem_alloc240718 respectively.
calls at that time is as follow,
44 active calls
69089 calls processed


By: pasandev (pasandev) 2018-07-25 00:17:23.976-0500

HI Josh,
Is the leak confirmed ? I can see you have link  SWP-10420 to this, But I don't have access to that info :)

By: Joshua C. Colp (jcolp) 2018-07-25 04:22:43.142-0500

The SWP is for internal planning purposes. While I haven't confirmed the leak through testing there does appear to be one according to the information. There is also no timeframe on when it will get looked into.

By: pasandev (pasandev) 2018-07-25 04:26:25.919-0500

Thanks for the update Josh

By: pasandev (pasandev) 2018-08-06 22:50:59.993-0500

Hi Guys,

Just following up on this, do we have an ETA on this as to when it'll get fix or upgrading to latest would fix this ?

By: Joshua C. Colp (jcolp) 2018-08-07 04:09:33.015-0500

There is still no timeframe on when this will be resolved. Asking does not change that answer.

By: Joshua C. Colp (jcolp) 2018-08-08 09:10:33.613-0500

Can you please follow the instructions on the wiki[1] for getting reference count information to understand where things are going awry.

[1] https://wiki.asterisk.org/wiki/display/AST/Reference+Count+Debugging

By: pasandev (pasandev) 2018-08-14 01:38:27.254-0500

Hi Josh,

I have taken the reference count log and parse it using the script, the compressed file is too large to upload here so I have put it in
https://ufile.io/0bz3a

Please download it from there.

By: Joshua C. Colp (jcolp) 2018-08-14 05:34:24.685-0500

There doesn't appear to be any leaks of the previously shown fax sessions in this reference count log. Can you provide the raw refs file, as well as build with MALLOC_DEBUG and confirm they have been leaked?

By: pasandev (pasandev) 2018-08-14 08:43:32.284-0500

Hi Josh,

This ref count log took after asterisk being up for few hours. Memory leak can be observe after about 1 day of continuous running But I can't log that much data as /var/log/asterisk/refs grows rapidly.  I have upload the raw refs file which is 6GB in total. Is there any alternate way to get this leak info without much logging ?. This build is also as MALLOC_DEBUG flag on.

you can download the raw file from https://ufile.io/4g448

By: Joshua C. Colp (jcolp) 2018-08-14 08:47:49.224-0500

Not really, the problem is that I need to see why the fax session isn't going away by examining its usage through the reference count log to see where it is getting leaked. The one you provided didn't show any leak so I wasn't able to use it to determine that.

By: Joshua C. Colp (jcolp) 2018-08-14 08:53:29.659-0500

I'll see if I can make an isolated patch with log messages to better understand it though.

By: pasandev (pasandev) 2018-08-14 09:07:35.587-0500

Hi Josh,

Thanks. I'll see if I can get a better log from my end. If you provide a patch to isolate these messages I'll try that too.

By: Joshua C. Colp (jcolp) 2018-08-14 09:22:19.545-0500

This is a patch which adds error level messages to allow tracking of the creation/destruction of fax sessions. Please apply it and provide the log once they start leaking.

By: pasandev (pasandev) 2018-08-14 21:26:49.994-0500

Sure thing Josh

By: pasandev (pasandev) 2018-08-22 08:46:02.908-0500

mem_snap0822 -- memory summary at the begining
mem_snap0822_2 -- memory summary at the end


By: pasandev (pasandev) 2018-08-22 08:48:23.684-0500

Hi Josh,

I have upload the messages file after running asterisk for more than 24 hours and confirming there's a difference in mem difference of 76M in res_fax_spandsp.c

Let me know if you need the full log file I can send it to you privately.


By: Kevin Harwell (kharwell) 2018-08-22 09:15:33.490-0500

[~pasandev], can you also please provide the output again from:
{noformat}
*CLI> memory show allocations
{noformat}


By: pasandev (pasandev) 2018-08-22 22:17:24.913-0500

output of 'memory show allocations'

By: pasandev (pasandev) 2018-08-22 22:18:21.424-0500

HI Kevin,

I have attached the output of 'memory show allocations' in mem_snap0823 file.


By: Friendly Automation (friendly-automation) 2018-08-30 05:05:05.732-0500

Change 10015 merged by Joshua Colp:
res_fax: Handle fax gateway being started more than once.

[https://gerrit.asterisk.org/10015|https://gerrit.asterisk.org/10015]

By: Friendly Automation (friendly-automation) 2018-08-30 05:43:55.500-0500

Change 10016 merged by Joshua Colp:
res_fax: Handle fax gateway being started more than once.

[https://gerrit.asterisk.org/10016|https://gerrit.asterisk.org/10016]

By: Friendly Automation (friendly-automation) 2018-08-30 05:44:22.852-0500

Change 10017 merged by Joshua Colp:
res_fax: Handle fax gateway being started more than once.

[https://gerrit.asterisk.org/10017|https://gerrit.asterisk.org/10017]

By: Friendly Automation (friendly-automation) 2018-08-30 05:44:37.906-0500

Change 10018 merged by Joshua Colp:
res_fax: Handle fax gateway being started more than once.

[https://gerrit.asterisk.org/10018|https://gerrit.asterisk.org/10018]

By: pasandev (pasandev) 2018-08-30 21:13:54.189-0500

HI Josh,

After we apply this patch asterisk starts to crash repeatedly while processing calls.

[6344012.910997] asterisk[321]: segfault at 29b10 ip 00007f3b3e6c1b5e sp 00007f3b38fe78a0 error 4 in res_fax_spandsp.so[7f3b3e6bd000+7000]
[6344164.373750] asterisk[556]: segfault at 2c7b4 ip 00007f8fbc61afc0 sp 00007f8f8a55a8b8 error 4 in libspandsp.so.2.0.0[7f8fbc5c5000+c0000]
[6344337.538865] asterisk[817]: segfault at 29b10 ip 00007fb54ee6ab5e sp 00007fb603cbb8a0 error 4 in res_fax_spandsp.so[7fb54ee66000+7000]
[6344788.746705] asterisk[1459]: segfault at 29b10 ip 00007f7f3c34fb5e sp 00007f7f0a0068a0 error 4 in res_fax_spandsp.so[7f7f3c34b000+7000]
[6344860.316757] asterisk[1562]: segfault at 29b10 ip 00007f88c3476b5e sp 00007f88d80818a0 error 4 in res_fax_spandsp.so[7f88c3472000+7000]
[6344891.478264] asterisk[1670]: segfault at 2c7b4 ip 00007f1eb861afc0 sp 00007f1e87e0e988 error 4 in libspandsp.so.2.0.0[7f1eb85c5000+c0000]
[6344942.778812] asterisk[1775]: segfault at 29b10 ip 00007f943746cb5e sp 00007f9426ddc8a0 error 4 in res_fax_spandsp.so[7f9437468000+7000]
[6345205.112373] asterisk[2155]: segfault at 2c7b4 ip 00007fad5e449fc0 sp 00007fad59e8d988 error 4 in libspandsp.so.2.0.0[7fad5e3f4000+c0000]
[6345519.041308] asterisk[2520]: segfault at 29b10 ip 00007f78324b3b5e sp 00007f782ceac8a0 error 4 in res_fax_spandsp.so[7f78324af000+7000]


By: Asterisk Team (asteriskteam) 2018-08-30 21:13:54.573-0500

This issue has been reopened as a result of your commenting on it as the reporter. It will be triaged once again as applicable.

By: Joshua C. Colp (jcolp) 2018-08-31 04:45:39.488-0500

Please provide a new backtrace showing this (and also ensure you're using the latest spandsp too). I ran through the tests we have for the scenarios for this and did not experience crashing.

By: pasandev (pasandev) 2018-08-31 05:20:55.360-0500

Hi Josh,

Sure, I'll verfiy if we have use the latest spandsp, I didn't build the binary, will get back to you soon.

By: pasandev (pasandev) 2018-09-02 20:44:10.700-0500

Hi Josh,

We used spandsp lib version 0.0.6. Which version are using recommend to use ? . 0.0.6 is the latest as we can see.


By: Joshua C. Colp (jcolp) 2018-09-03 05:47:42.806-0500

As long as it's the latest that is fine, but I do need the backtrace. I ran all the tests multiple times and couldn't recreate your crash.

By: pasandev (pasandev) 2018-09-09 21:21:02.757-0500

core dump backtrace

By: pasandev (pasandev) 2018-09-09 21:30:04.141-0500

Hi Josh,

I couldn't replicate the original crash, But getting a different crash on simulating following dialplan with sipp dummy calls after applying the provided patch above. I'll try to get the original crash reproduce again.

extensions.conf
{noformat}
[inbound]
exten => _X.,1,NoOp("Incoming CLI: ${CALLERID(num)}")
exten => _X.,n,ExecIf($["${DB_EXISTS(spamcli/${CALLERID(num)})}"="1"]?Hangup())
exten => _X.,n,Set(__CALLID=xxxxxxxxxxxx)
exten => _X.,n,Set(CDR(callid)=${CALLID})
exten => _X.,n,ExecIf($["${CALLERID(num)}"="anonymous"]?Set(CALLERID(num-pres)=prohib))
exten => _X.,n,Set(CDR(remote_hostinfo)=${CUT(CUT(PJSIP_HEADER(read,Via,1), ,2),\;,1)})
exten => _X.,n,Set(CDR(dst_hostinfo)=x.x.x.x:5060)
exten => _X.,n,Dial(PJSIP/${EXTEN}@provider,90,b(callid^addcallid^1))


[callid]
exten => addcallid,1,Set(PJSIP_HEADER(add,X-CALLID)=${CALLID})
exten => addcallid,n,Return()
{noformat}

pjsip.conf
{noformat}
[provider]
type=endpoint
transport=transport-udp
context=outbound
tone_zone=au
;dtmf_mode=auto
disallow=all
allow=alaw
allow=ulaw
allow=g729
timers=no
t38_udptl=yes
t38_udptl_ec=fec
t38_udptl_maxdatagram=400
set_var=FAXOPT(gateway)=yes,20
send_rpid=no
trust_id_inbound=yes
rtp_timeout=60
rtp_timeout_hold=120
aors=provider
force_rport=yes    ;It's a good idea to read the configuration help for each
direct_media=no    ;of these options.
identify_by=ip

[provider]
type=aor
contact=sip:x.x.x.x:5060
qualify_frequency=5

[provider]
type=identify
endpoint=provider
match=x.x.x.x
{noformat}


By: pasandev (pasandev) 2018-09-10 23:45:30.892-0500

coredump extract for which asterisk segfault with following error

asterisk[13799]: segfault at 29b10 ip 00007f382f71f618 sp 00007f38277055a0 error 4 in res_fax_spandsp.so[7f382f71a000+8000]


By: pasandev (pasandev) 2018-10-02 10:50:23.344-0500

anyone got a chance to look at the reason for crash ?

By: Joshua C. Colp (jcolp) 2018-10-02 10:59:55.398-0500

If there is any update it will be posted here.

By: Alexei Gradinari (alexei gradinari) 2019-05-28 16:15:25.760-0500

Added patch which should fix this segfaults.
https://gerrit.asterisk.org/11417


By: Friendly Automation (friendly-automation) 2019-06-04 05:09:26.170-0500

Change 11434 merged by Friendly Automation:
res_fax: fix segfault on inactive "reserved" fax session

[https://gerrit.asterisk.org/c/asterisk/+/11434|https://gerrit.asterisk.org/c/asterisk/+/11434]

By: Friendly Automation (friendly-automation) 2019-06-04 05:31:16.122-0500

Change 11433 merged by Joshua Colp:
res_fax: fix segfault on inactive "reserved" fax session

[https://gerrit.asterisk.org/c/asterisk/+/11433|https://gerrit.asterisk.org/c/asterisk/+/11433]

By: Friendly Automation (friendly-automation) 2019-06-04 05:31:47.921-0500

Change 11417 merged by Joshua Colp:
res_fax: fix segfault on inactive "reserved" fax session

[https://gerrit.asterisk.org/c/asterisk/+/11417|https://gerrit.asterisk.org/c/asterisk/+/11417]