Summary: | ASTERISK-27981: res_fax: Fax session leak with fax gatewaying | ||||
Reporter: | pasandev (pasandev) | Labels: | fax patch pjsip | ||
Date Opened: | 2018-07-23 00:22:32 | Date Closed: | 2018-08-30 05:05:04 | ||
Priority: | Major | Regression? | |||
Status: | Closed/Complete | Components: | Resources/res_fax | ||
Versions: | 13.21.1 | Frequency of Occurrence | Constant | ||
Related Issues: |
| ||||
Environment: | OS: Debian Stretch 9.4 CPU Details: Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 6 On-line CPU(s) list: 0-5 Thread(s) per core: 1 Core(s) per socket: 6 Socket(s): 1 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 44 Model name: Westmere E56xx/L56xx/X56xx (Nehalem-C) Stepping: 1 CPU MHz: 2933.612 BogoMIPS: 5867.22 Hypervisor vendor: KVM Virtualization type: full L1d cache: 32K L1i cache: 32K L2 cache: 4096K L3 cache: 16384K NUMA node0 CPU(s): 0-5 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology pni pclmulqdq ssse3 cx16 pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes hypervisor lahf_lm kaiser tsc_adjust arat | Attachments: | ( 0) ASTERISK-27981.tar.gz ( 1) ASTERISK-27981-2.tar.gz ( 2) console ( 3) extensions.conf ( 4) fax-error-reference.diff ( 5) mem_alloc ( 6) mem_alloc240718 ( 7) mem_snap0822 ( 8) mem_snap0822_2 ( 9) mem_snap0823 (10) mem_sum (11) mem_sum240718 (12) messages.xz (13) mmlog (14) module_show (15) pjsip.conf | ||
Description: | possible memory leak on res_fax_spandsp.c according to "memory show summary" command, mem allocated to res_fax_spandsp.c keeps growing.
I have a simple gateway setup to terminate calls from/to supplier using PJSIP Channel driver. Currently handling supplier -> Asterisk -> Customer calls. Customer <> Asterisk <> Supplier | ||||
Comments: | By: Asterisk Team (asteriskteam) 2018-07-23 00:22:35.388-0500 Thanks for creating a report! The issue has entered the triage process. That means the issue will wait in this status until a Bug Marshal has an opportunity to review the issue. Once the issue has been reviewed you will receive comments regarding the next steps towards resolution. A good first step is for you to review the [Asterisk Issue Guidelines|https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines] if you haven't already. The guidelines detail what is expected from an Asterisk issue report. Then, if you are submitting a patch, please review the [Patch Contribution Process|https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process]. By: pasandev (pasandev) 2018-07-23 00:24:47.983-0500 memory show allocations command output By: pasandev (pasandev) 2018-07-23 00:25:54.286-0500 memory show summary command output By: pasandev (pasandev) 2018-07-23 00:46:14.194-0500 /var/log/asterisk/mmlog file By: Joshua C. Colp (jcolp) 2018-07-23 04:34:44.258-0500 What is actually being experienced? Is it running out of memory? What is the console log? How many calls are in progress? By: pasandev (pasandev) 2018-07-23 04:49:21.656-0500 Hi Joshua, The memory usage keep growing on the server. We have rrd base graph monitoring and it indicates the memory growth. I took a 'core show calls' snapshot of the server right now and its as below, 15 active calls 60637 calls processed what bit of console log do you want ? By: Joshua C. Colp (jcolp) 2018-07-23 04:59:19.563-0500 A flow to show a normal call as it goes through the system just to confirm things. As well is the log from that same system? What is the output of "module show"? By: pasandev (pasandev) 2018-07-23 05:17:06.820-0500 HI Jousha, yup, logs are from same system. I have upload the module show output with module_show file. Also attach the console output after removing sensitive data. This system handles both fax/voice calls By: pasandev (pasandev) 2018-07-23 22:01:42.998-0500 Hi Josh, I have attached memory summary and memory allocations command output for today in files mem_sum240718 and mem_alloc240718 respectively. calls at that time is as follow, 44 active calls 69089 calls processed By: pasandev (pasandev) 2018-07-25 00:17:23.976-0500 HI Josh, Is the leak confirmed ? I can see you have link SWP-10420 to this, But I don't have access to that info :) By: Joshua C. Colp (jcolp) 2018-07-25 04:22:43.142-0500 The SWP is for internal planning purposes. While I haven't confirmed the leak through testing there does appear to be one according to the information. There is also no timeframe on when it will get looked into. By: pasandev (pasandev) 2018-07-25 04:26:25.919-0500 Thanks for the update Josh By: pasandev (pasandev) 2018-08-06 22:50:59.993-0500 Hi Guys, Just following up on this, do we have an ETA on this as to when it'll get fix or upgrading to latest would fix this ? By: Joshua C. Colp (jcolp) 2018-08-07 04:09:33.015-0500 There is still no timeframe on when this will be resolved. Asking does not change that answer. By: Joshua C. Colp (jcolp) 2018-08-08 09:10:33.613-0500 Can you please follow the instructions on the wiki[1] for getting reference count information to understand where things are going awry. [1] https://wiki.asterisk.org/wiki/display/AST/Reference+Count+Debugging By: pasandev (pasandev) 2018-08-14 01:38:27.254-0500 Hi Josh, I have taken the reference count log and parse it using the script, the compressed file is too large to upload here so I have put it in https://ufile.io/0bz3a Please download it from there. By: Joshua C. Colp (jcolp) 2018-08-14 05:34:24.685-0500 There doesn't appear to be any leaks of the previously shown fax sessions in this reference count log. Can you provide the raw refs file, as well as build with MALLOC_DEBUG and confirm they have been leaked? By: pasandev (pasandev) 2018-08-14 08:43:32.284-0500 Hi Josh, This ref count log took after asterisk being up for few hours. Memory leak can be observe after about 1 day of continuous running But I can't log that much data as /var/log/asterisk/refs grows rapidly. I have upload the raw refs file which is 6GB in total. Is there any alternate way to get this leak info without much logging ?. This build is also as MALLOC_DEBUG flag on. you can download the raw file from https://ufile.io/4g448 By: Joshua C. Colp (jcolp) 2018-08-14 08:47:49.224-0500 Not really, the problem is that I need to see why the fax session isn't going away by examining its usage through the reference count log to see where it is getting leaked. The one you provided didn't show any leak so I wasn't able to use it to determine that. By: Joshua C. Colp (jcolp) 2018-08-14 08:53:29.659-0500 I'll see if I can make an isolated patch with log messages to better understand it though. By: pasandev (pasandev) 2018-08-14 09:07:35.587-0500 Hi Josh, Thanks. I'll see if I can get a better log from my end. If you provide a patch to isolate these messages I'll try that too. By: Joshua C. Colp (jcolp) 2018-08-14 09:22:19.545-0500 This is a patch which adds error level messages to allow tracking of the creation/destruction of fax sessions. Please apply it and provide the log once they start leaking. By: pasandev (pasandev) 2018-08-14 21:26:49.994-0500 Sure thing Josh By: pasandev (pasandev) 2018-08-22 08:46:02.908-0500 mem_snap0822 -- memory summary at the begining mem_snap0822_2 -- memory summary at the end By: pasandev (pasandev) 2018-08-22 08:48:23.684-0500 Hi Josh, I have upload the messages file after running asterisk for more than 24 hours and confirming there's a difference in mem difference of 76M in res_fax_spandsp.c Let me know if you need the full log file I can send it to you privately. By: Kevin Harwell (kharwell) 2018-08-22 09:15:33.490-0500 [~pasandev], can you also please provide the output again from: {noformat} *CLI> memory show allocations {noformat} By: pasandev (pasandev) 2018-08-22 22:17:24.913-0500 output of 'memory show allocations' By: pasandev (pasandev) 2018-08-22 22:18:21.424-0500 HI Kevin, I have attached the output of 'memory show allocations' in mem_snap0823 file. By: Friendly Automation (friendly-automation) 2018-08-30 05:05:05.732-0500 Change 10015 merged by Joshua Colp: res_fax: Handle fax gateway being started more than once. [https://gerrit.asterisk.org/10015|https://gerrit.asterisk.org/10015] By: Friendly Automation (friendly-automation) 2018-08-30 05:43:55.500-0500 Change 10016 merged by Joshua Colp: res_fax: Handle fax gateway being started more than once. [https://gerrit.asterisk.org/10016|https://gerrit.asterisk.org/10016] By: Friendly Automation (friendly-automation) 2018-08-30 05:44:22.852-0500 Change 10017 merged by Joshua Colp: res_fax: Handle fax gateway being started more than once. [https://gerrit.asterisk.org/10017|https://gerrit.asterisk.org/10017] By: Friendly Automation (friendly-automation) 2018-08-30 05:44:37.906-0500 Change 10018 merged by Joshua Colp: res_fax: Handle fax gateway being started more than once. [https://gerrit.asterisk.org/10018|https://gerrit.asterisk.org/10018] By: pasandev (pasandev) 2018-08-30 21:13:54.189-0500 HI Josh, After we apply this patch asterisk starts to crash repeatedly while processing calls. [6344012.910997] asterisk[321]: segfault at 29b10 ip 00007f3b3e6c1b5e sp 00007f3b38fe78a0 error 4 in res_fax_spandsp.so[7f3b3e6bd000+7000] [6344164.373750] asterisk[556]: segfault at 2c7b4 ip 00007f8fbc61afc0 sp 00007f8f8a55a8b8 error 4 in libspandsp.so.2.0.0[7f8fbc5c5000+c0000] [6344337.538865] asterisk[817]: segfault at 29b10 ip 00007fb54ee6ab5e sp 00007fb603cbb8a0 error 4 in res_fax_spandsp.so[7fb54ee66000+7000] [6344788.746705] asterisk[1459]: segfault at 29b10 ip 00007f7f3c34fb5e sp 00007f7f0a0068a0 error 4 in res_fax_spandsp.so[7f7f3c34b000+7000] [6344860.316757] asterisk[1562]: segfault at 29b10 ip 00007f88c3476b5e sp 00007f88d80818a0 error 4 in res_fax_spandsp.so[7f88c3472000+7000] [6344891.478264] asterisk[1670]: segfault at 2c7b4 ip 00007f1eb861afc0 sp 00007f1e87e0e988 error 4 in libspandsp.so.2.0.0[7f1eb85c5000+c0000] [6344942.778812] asterisk[1775]: segfault at 29b10 ip 00007f943746cb5e sp 00007f9426ddc8a0 error 4 in res_fax_spandsp.so[7f9437468000+7000] [6345205.112373] asterisk[2155]: segfault at 2c7b4 ip 00007fad5e449fc0 sp 00007fad59e8d988 error 4 in libspandsp.so.2.0.0[7fad5e3f4000+c0000] [6345519.041308] asterisk[2520]: segfault at 29b10 ip 00007f78324b3b5e sp 00007f782ceac8a0 error 4 in res_fax_spandsp.so[7f78324af000+7000] By: Asterisk Team (asteriskteam) 2018-08-30 21:13:54.573-0500 This issue has been reopened as a result of your commenting on it as the reporter. It will be triaged once again as applicable. By: Joshua C. Colp (jcolp) 2018-08-31 04:45:39.488-0500 Please provide a new backtrace showing this (and also ensure you're using the latest spandsp too). I ran through the tests we have for the scenarios for this and did not experience crashing. By: pasandev (pasandev) 2018-08-31 05:20:55.360-0500 Hi Josh, Sure, I'll verfiy if we have use the latest spandsp, I didn't build the binary, will get back to you soon. By: pasandev (pasandev) 2018-09-02 20:44:10.700-0500 Hi Josh, We used spandsp lib version 0.0.6. Which version are using recommend to use ? . 0.0.6 is the latest as we can see. By: Joshua C. Colp (jcolp) 2018-09-03 05:47:42.806-0500 As long as it's the latest that is fine, but I do need the backtrace. I ran all the tests multiple times and couldn't recreate your crash. By: pasandev (pasandev) 2018-09-09 21:21:02.757-0500 core dump backtrace By: pasandev (pasandev) 2018-09-09 21:30:04.141-0500 Hi Josh, I couldn't replicate the original crash, But getting a different crash on simulating following dialplan with sipp dummy calls after applying the provided patch above. I'll try to get the original crash reproduce again. extensions.conf {noformat} [inbound] exten => _X.,1,NoOp("Incoming CLI: ${CALLERID(num)}") exten => _X.,n,ExecIf($["${DB_EXISTS(spamcli/${CALLERID(num)})}"="1"]?Hangup()) exten => _X.,n,Set(__CALLID=xxxxxxxxxxxx) exten => _X.,n,Set(CDR(callid)=${CALLID}) exten => _X.,n,ExecIf($["${CALLERID(num)}"="anonymous"]?Set(CALLERID(num-pres)=prohib)) exten => _X.,n,Set(CDR(remote_hostinfo)=${CUT(CUT(PJSIP_HEADER(read,Via,1), ,2),\;,1)}) exten => _X.,n,Set(CDR(dst_hostinfo)=x.x.x.x:5060) exten => _X.,n,Dial(PJSIP/${EXTEN}@provider,90,b(callid^addcallid^1)) [callid] exten => addcallid,1,Set(PJSIP_HEADER(add,X-CALLID)=${CALLID}) exten => addcallid,n,Return() {noformat} pjsip.conf {noformat} [provider] type=endpoint transport=transport-udp context=outbound tone_zone=au ;dtmf_mode=auto disallow=all allow=alaw allow=ulaw allow=g729 timers=no t38_udptl=yes t38_udptl_ec=fec t38_udptl_maxdatagram=400 set_var=FAXOPT(gateway)=yes,20 send_rpid=no trust_id_inbound=yes rtp_timeout=60 rtp_timeout_hold=120 aors=provider force_rport=yes ;It's a good idea to read the configuration help for each direct_media=no ;of these options. identify_by=ip [provider] type=aor contact=sip:x.x.x.x:5060 qualify_frequency=5 [provider] type=identify endpoint=provider match=x.x.x.x {noformat} By: pasandev (pasandev) 2018-09-10 23:45:30.892-0500 coredump extract for which asterisk segfault with following error asterisk[13799]: segfault at 29b10 ip 00007f382f71f618 sp 00007f38277055a0 error 4 in res_fax_spandsp.so[7f382f71a000+8000] By: pasandev (pasandev) 2018-10-02 10:50:23.344-0500 anyone got a chance to look at the reason for crash ? By: Joshua C. Colp (jcolp) 2018-10-02 10:59:55.398-0500 If there is any update it will be posted here. By: Alexei Gradinari (alexei gradinari) 2019-05-28 16:15:25.760-0500 Added patch which should fix this segfaults. https://gerrit.asterisk.org/11417 By: Friendly Automation (friendly-automation) 2019-06-04 05:09:26.170-0500 Change 11434 merged by Friendly Automation: res_fax: fix segfault on inactive "reserved" fax session [https://gerrit.asterisk.org/c/asterisk/+/11434|https://gerrit.asterisk.org/c/asterisk/+/11434] By: Friendly Automation (friendly-automation) 2019-06-04 05:31:16.122-0500 Change 11433 merged by Joshua Colp: res_fax: fix segfault on inactive "reserved" fax session [https://gerrit.asterisk.org/c/asterisk/+/11433|https://gerrit.asterisk.org/c/asterisk/+/11433] By: Friendly Automation (friendly-automation) 2019-06-04 05:31:47.921-0500 Change 11417 merged by Joshua Colp: res_fax: fix segfault on inactive "reserved" fax session [https://gerrit.asterisk.org/c/asterisk/+/11417|https://gerrit.asterisk.org/c/asterisk/+/11417] |