[Home]

Summary:ASTERISK-15359: [patch] Segmentation fault using manager http MXML
Reporter:nik600 (nik600)Labels:
Date Opened:2009-12-23 04:00:10.000-0600Date Closed:2011-01-20 10:40:34.000-0600
Priority:CriticalRegression?No
Status:Closed/CompleteComponents:Core/ManagerInterface
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) 20100629__issue16506.diff.txt
( 1) backtrace.1.4.26.3.txt
( 2) backtrace.1.4.27.txt
( 3) backtrace.txt
( 4) backtrace2.txt
( 5) backtrace3.txt
( 6) manager-1.4-v3.patch
( 7) manager-1.4-v5.patch
( 8) manager-1.6.1-v1.patch
Description:Dear all, i'm experiencing a problem with the manager http MXML interface.

I'm using Asterisk 1.4.26.2 on a slackware 13.0

On a system that recieves about 1000-1200 calls per day this happens with a frequency of 1 time per day.

These are the HTTP MXML requests used:

action=login
action=queuestatus
action=QueueAdd
action=QueueRemove
action=SipPeers

I'm trying to reproduce the problem, but at the moment i haven't yet figured it out.


****** ADDITIONAL INFORMATION ******

I've also generated a dump using gdb, it seems that the problem is in manager.c
#0  xml_translate (in=0xb5cf7000 <Address 0xb5cf7000 out of bounds>, vars=0xd0d0d0d) at manager.c:396
396 for (x = 0; in[x]; x++) {
(gdb) bt
#0  xml_translate (in=0xb5cf7000 <Address 0xb5cf7000 out of bounds>, vars=0xd0d0d0d) at manager.c:396
#1  0x080c01cd in generic_http_callback (format=2, requestor=<value optimized out>, uri=<value optimized out>, params=0x8408a90,
   status=0xb5ad2344, title=0xb5ad2348, contentlength=0xb5ad2340) at manager.c:2890
#2  0x080ac53d in ast_httpd_helper_thread (data=0xb5852938) at http.c:369
#3  0x0810371b in dummy_start (data=0xb5840888) at utils.c:856
#4  0xb7f5d310 in start_thread () from /lib/libpthread.so.0
ASTERISK-1  0xb728abee in clone () from /lib/libc.so.6


I report some lines of the interested point:

       for (x = 0; in[x]; x++) {
               if (in[x] == ':')
                       colons++;
               else if (in[x] == '\n')
               breaks++;
               else if (strchr("&\"<>\'", in[x]))
                       escaped++;
       }

It seems that the problem is the assumption of the in[x]
Comments:By: nik600 (nik600) 2009-12-23 07:22:36.000-0600

i've noticed that something ragerding manager has been fixed in 1.4.28

I've upgraded and testing that version.

By: Leif Madsen (lmadsen) 2010-01-04 13:17:38.000-0600

Thanks for the info. I'm moving this to Feedback and if you can report back whether the issue is resolved or not, that'd be useful. I'll leave this open for about a week, and if I don't hear back will close.

If you can provide a backtrace (if it crashes again) with DONT_OPTIMIZE enabled in the Compiler Flags of menuselect, that'd be useful. More information in the doc/backtrace.txt file in your Asterisk source.

Thanks!

By: nik600 (nik600) 2010-01-04 13:44:02.000-0600

in the test environmet i've upgraded to 1.4.28 and the problem doesn't appears.

However, i don't have much load on the test environment, it seems that the problem is related to much various activity on the http/mxml interface.

We have planned the upgrade in production for the next week, i hope to give you feedback as soon as possible.

Thanks

By: Leif Madsen (lmadsen) 2010-01-04 14:38:42.000-0600

Great thanks. I will leave this issue open for now until you are able to provide additional information. Thanks!

By: Leif Madsen (lmadsen) 2010-03-23 09:41:43

Pinging the reporter for additional feedback on this issue prior to suspending it.

By: nik600 (nik600) 2010-03-23 09:49:51

feel free to suspend it, i'm not able to reproduce it in test environment and in production environment we still are using 1.4.26.2 with that version it happend (looking at the last 3 month average) 2/3 times per week.

In the next month i'll start a new project (probably i'll use directly 1.4.30) if the problem perists i'll keep you update.

Sorry for that, i cannot force the customer to change the production version.

By: Evandro César Arruda (ecarruda) 2010-03-23 13:59:10

People,

I'm running Asterisk 1.4.26.1, 1.4.26.2 and 1.4.30 and i'm having the same problems, i have the same problem running Dashboard using rawman of asterisk manager, see data:\

the end of bt full and bt

_buffer = {__routine = 0x8069000 <ast_unregister_thread>, __arg = 0xb44a7bb0, __canceltype = 0, __prev = 0x0}
ret = <value optimized out>
#4  0xb7f85240 in start_thread () from /lib/tls/i686/cmov/libpthread.so.0
No symbol table info available.
ASTERISK-1  0xb707449e in clone () from /lib/tls/i686/cmov/libc.so.6
No symbol table info available.
(gdb) bt
#0  0xb7014463 in strlen () from /lib/tls/i686/cmov/libc.so.6
#1  0x080bb954 in generic_http_callback (format=0, requestor=0xb483cd28, uri=<value optimized out>, params=0xb3b00770, status=0xb44a7404, title=0xb44a7408, contentlength=0xb44a7400) at manager.c:2914
#2  0x080ab2ab in ast_httpd_helper_thread (data=0xb483cd20) at http.c:369
#3  0x08102b50 in dummy_start (data=0xb4818c50) at utils.c:856
#4  0xb7f85240 in start_thread () from /lib/tls/i686/cmov/libpthread.so.0
ASTERISK-1  0xb707449e in clone () from /lib/tls/i686/cmov/libc.so.6


If need something more, please, let me know.

By: Leif Madsen (lmadsen) 2010-03-24 10:47:00

ecarruda: your backtrace appears to be created without DONT_OPTIMZE enabled which makes it less than useful.

Please *attach* a backtrace per the doc/backtrace.txt file in your Asterisk source if you can reproduce this on 1.4.30.

By: Evandro César Arruda (ecarruda) 2010-03-24 10:53:08

Hey Lmadsen,

I will try to do it on my lab, because i'm having this on customer and i can't put it on DONT_OPTIMIZE mode, because have 10.000 calls by day and 12 E1 ports.

If i can't do it on my lab, i will choive if as a last chance to solve, today i change HTTP Manager port, and don't have more problems.

Wait a new feedback and thanks for your atention.

Evandro

By: Konrad Rozycki (krdian) 2010-03-25 10:46:13

I have the same issue since upgraded to 1.4.30. You can find details in attached backtrace file.

By: Evandro César Arruda (ecarruda) 2010-03-25 11:39:29

Perfect,

Reading the source and bt, i can see somethings about the problem, have relationship with pthread control, something is turning the Thread Control crazy, we can se on pop method called of the pthread the error about killing thread when don't exist more this thread.

If don't have another choice, i will recompile on my customer, but, some people will kill me ahehaehae.

Just to know, i'm using on the big part of the time:

Status
QueueStatus

Thanks for yours help

By: Konrad Rozycki (krdian) 2010-03-26 10:03:04

I'm using status & queuestatus as well. Connection to http through proxypass.

By: nik600 (nik600) 2010-03-26 10:06:47

as reported in my firt message i'm using the

action=login
action=queuestatus
action=QueueAdd
action=QueueRemove
action=SipPeers

so probably the problem is in queuestatus ?

By: Konrad Rozycki (krdian) 2010-03-29 10:02:30

I'm afraid that my problems are connected with vmware/blade. I have moved my call centre on virtual machine. In my case action=(queuestatus,status,waitevent,etc...) experiencing problems. Anyway I'll try with asterisk 1.4.21.2 which does work stable on fixed machine.

By: nik600 (nik600) 2010-03-29 10:09:46

i'm using vmware too (both on blade and DL380).

By: Evandro César Arruda (ecarruda) 2010-03-30 10:08:07

People,

I'm having this problem on all my customers running some AJAM conenction, i always use rawman and get status and queuestatus.

I will enable tomorrow the debug symbols on my customer to us.

Thanks.

By: Evandro César Arruda (ecarruda) 2010-03-31 14:46:51

People,

I removed optimize and enable debug symbols on my customer asterisk, today, after to disable the feature, i didn't have any problem with AJAM.

I will wait tomorrow to see one more day working, can be something with optimize section?

Many Thanks.

By: Evandro César Arruda (ecarruda) 2010-04-01 13:53:05

Second day running with DONT_OPTMIZE enabled, and now errors at this moment.

Regards,
Evandro

By: Evandro César Arruda (ecarruda) 2010-04-01 23:07:35

Lmadsen,

Now i have core dump with dont_optimize and debug symbos, what you need? how can i send to issues?

Error o GDB:

#0  0x080d3454 in generic_http_callback (format=0, requestor=0xb4511c40, uri=0xb3466400 "", params=0x94f54e0, status=0xb34652e0, title=0xb34652e4, contentlength=0xb34652dc) at manager.c:2914
2914 if ((retval = malloc((wlen = strlen(workspace)) + (tlen = strlen(tmpbuf)) + 128))) {

To bt and bt full see atachments

By: Evandro César Arruda (ecarruda) 2010-04-03 07:30:41

I Set the backtrace3 with threads info, and simple threads bt

If need threads bt full please let me know.

Thanks

By: Evandro César Arruda (ecarruda) 2010-04-03 07:32:42

Reading backtrace3, we can see the last executed code on the manager.c:

Stoped on this if:

if ((retval = malloc((wlen = strlen(workspace)) + (tlen = strlen(tmpbuf)) + 128))) {
strcpy(retval, workspace);
strcpy(retval + wlen, tmpbuf);
c = retval + wlen + tlen;
/* Leftover space for footer, if any */
len = 120;
}

By: Evandro César Arruda (ecarruda) 2010-04-06 14:06:24

Well,

Core dump is on the strlen, i tryed to execute a notice with debug, see that:

Program terminated with signal 11, Segmentation fault.
#0  0x080d5a2b in generic_http_callback (format=0, requestor=0xb27b3908, uri=0xb2d24400 "", params=0x9504868, status=0xb2d232e0, title=0xb2d232e4, contentlength=0xb2d232dc) at manager.c:2916
2916 ast_log(LOG_NOTICE, "Getting CoreDump: lwork = %d e tlen = %d \r\n\r\n", strlen(workspace), strlen(tmpbuf));

Have everthing wrong on workspace variable or tmpbuf.

Thanks

By: Evandro César Arruda (ecarruda) 2010-04-06 14:37:36

News for you:

(gdb) print workspace
$1 = "Content-type: text/plain\r\nSet-Cookie: mansession_id=\"1ee4ea8c\"; Version=\"1\"; Max-Age=2000\r\n\r\n\000\000\000?\201\000\000\001", '\0' <repeats 23 times>, "?\000\000\000\000\000\000\000\000\000\002\000\b\000\000\000\000\000\000\0009??J\000\000\000\0009??J\000\000\000\0009??J\000\000\000\000n\006\000\000\000\000\000\000\214200?\t\005??t\021?\206?\004??_\021?\017\000\000\000}?\020??0?\226A\005??"...
(gdb) print tmpbuf
$2 = 0xb4a1e000 <Address 0xb4a1e000 out of bounds>

By: Evandro César Arruda (ecarruda) 2010-04-07 09:33:15

People,

We can see something wrong, i'm using Rawman, because that don't have xml or html format added to buffer, whe can see strlen of l variable and tmpbuf.

((gdb) print wlen
$20 = 3030507520
(gdb) print tlen
$21 = 2088
(gdb) print l
$22 = 3030507520
(gdb) print ss.fd
$23 = 156515680
(gdb) print c
$24 = 0x0
(gdb) print workspace
$25 = "I/r4Content-type: text/plain\r\nSet-Cookie: mansession_id=\"1ee4ea8c\"; Version=\"1\"; Max-Age=2000\r\n\r\n\000\000\000?\201\000\000\001", '\0' <repeats 23 times>, "?\000\000\000\000\000\000\000\000\000\002\000\b\000\000\000\000\000\000\0009??J\000\000\000\0009??J\000\000\000\0009??J\000\000\000\000n\006\000\000\000\000\000\000\214200?\t\005??t\021?\206?\004??_\021?\017\000\000\000}?\020??0?\226"...
(gdb) print buf
$26 = 0x18 <Address 0x18 out of bounds>
(gdb) print tmpbuf
$27 = 0x5000 <Address 0x5000 out of bounds>
(gdb)

tlen is bigger then l, because that have memory error?

THanks



By: Matteo (mpiazzatnetbug) 2010-04-11 13:28:06

I have the same issue,asterisk 1.4.26.3, 1000 peers. We are using a post operator AJAM application. Every two days asterisk crashs. From the backtrace.1.4.26.3 It's seems the same issue above

By: Evandro César Arruda (ecarruda) 2010-04-11 14:31:56

Yeah,

With Don't optimize reduces the crash, because disable some malloc optimizations, and give the variable to use 3gb of RAM.

I'm tested with 1.4.26, 1.4.26.2, 1.4.28, 1.4.29, 1.4.30, all the versions have this problem, something relationship with workspace variable using file on the file sustem to create buffer.

I have a low number of the sip and iax peers, more las 10, have 8 sip using AgentLogin() and Have 12 E1's connected using mfcr2, 360 channels.

I request to ajam just QueueStatus and Status, and i have crash when i have more less 80 simultaneos calls, 80 x 2 because use E1 Trunks, 160 channels.

I'm waiting digium people, and i'm trying to create a patch changing somethings, but, didn't have sucess at this momment.

Thanks people

By: Evandro César Arruda (ecarruda) 2010-04-11 14:33:30

Hey mpiazzatnetbug,

Can you give me access to your machine to debug your core dump file? You can send to my email if it's possible ( we can do it using screen ).

My emals: evandro@ezvoice.com.br, evandro@stonts.com, ecarruda@gmail.com

Thanks man.

By: Matteo (mpiazzatnetbug) 2010-04-11 15:02:21

Sorry Man, I can't give the access of this machine outside. It's a production sistem.
Did you check if 1.4.25 is affected by this issue?

By: Evandro César Arruda (ecarruda) 2010-04-12 10:45:17

mpiazzatnetbug,

First, what's the action(s) you calling? Status, QueueStatus????

Can you apply a litle patch just to locate the problem, could be stop the asterisk to crash, i'm trying to search th eproblem, but i don't have a constant scene to apply this.

Can you do? Just a debugger to us?

If you can, send me your e-mail and i will attach them here,

One more time, it isn't a patch to solve, it's a patch do display debug information when "found" the problem.



By: Evandro César Arruda (ecarruda) 2010-04-12 14:01:22

People,

I found missed channel lock on Status action request, i'm having problem just with this action.

I backported somethings of 1.6.2 please, test this patch.

mpiazzatnetbug cany ou test this and send your report?

Thanks



By: Evandro César Arruda (ecarruda) 2010-04-12 15:01:01

Forget the added  channel lock.

I missed the locker struct, i'm working on the new patch.

By: Matteo (mpiazzatnetbug) 2010-04-15 10:33:27

I applied this patch https://issues.asterisk.org/view.php?id=15495 five days ago.
Until now the server is runnging wihout crash, but I think it's too early to consider the issue solved.
The ajam application don't use queue command, the app checks only the status of the extensions. Also I'm logging the ajam application messages so if it will crash another time we can discover on which request asterisk has crashed.

By: Evandro César Arruda (ecarruda) 2010-04-15 11:29:17

mpiazzatnetbug,

The old changes make sense to me, i'm sending my new patch, you can test this?

This patch have:

- The old patch
- Backport of 1.6.2 ideias on manager.c
- Changed the temp file work
- Changed the socket use
- Changed string builds
- Added Header informations
- Clean the old code

Can you test with my patch and send a new report? i'm testing too, the old issue don't have a final report to close this, we need to do it now to apply to stable version.

Thanks one more time.

By: Marcin Kowalczyk (kowalma) 2010-04-15 13:11:01

I've tried to apply this patch against 1.6.1.18 but no luck. Any idea to have this patch ported for 1.6.1

By: Evandro César Arruda (ecarruda) 2010-04-15 13:36:39

kowalma,

Are you using 1.6.1.18?

I will create one patch to you.

Wait, please



By: Evandro César Arruda (ecarruda) 2010-04-15 14:34:40

kowalma,

Done, now you can test for us?

I sent new file patch to 1.4 ( v3 ), removed extra free process on file socket.

Thanks man.

By: Marcin Kowalczyk (kowalma) 2010-04-15 14:38:35

Patching process OK.
In few days I will get back with results, as I have crash every 4-5 days

By: Evandro César Arruda (ecarruda) 2010-04-15 14:45:34

kowalma,

if your asterisk crashes after the patch, can you send to me bt, bt full, thread apply all bt?

If you can check on the asterisk log for "Buffer Overflow Detected, memory size exceeded" warning log?

Thank you soo much.



By: Marcin Kowalczyk (kowalma) 2010-04-15 14:54:48

Sure. NP.
Thx for quick responce

By: Matteo (mpiazzatnetbug) 2010-04-16 09:21:20

I'm using the patch on a test server. I will report the results.

By: Evandro César Arruda (ecarruda) 2010-04-16 09:25:29

Hey mpiazzatnetbug,

Many thanks to you too :D

By: Evandro César Arruda (ecarruda) 2010-04-19 10:13:01

Sorry People,

I sent wrong file.

lmadsen can you remove for me the file: manager_actstatus_seconds-1.6.2-v1.patch?

It's for another topic.

Thank you soo much, and one more time, sorry.



By: Leif Madsen (lmadsen) 2010-04-19 14:28:35

Patched deleted on request. Is there anything else that I can clean up?

By: Evandro César Arruda (ecarruda) 2010-04-19 14:58:11

Hey Lmadsen,

manager-v1.patch and manager-v2.patch is old trys, if you can cleanup, you give my ok.

I'm testing on two big customer the last patch, and it's working fine at this momment, we need to wait the other members report something.

Thank you one more time.

By: Evandro César Arruda (ecarruda) 2010-04-22 23:36:14

Hey people,

Does anyone already have some feedback?

Thank's for all guys with yours tests

By: Marcin Kowalczyk (kowalma) 2010-04-23 03:56:20

Hi
Patch looks fine for me (1.6.1.18). I haven't seen crash since patching

By: Evandro César Arruda (ecarruda) 2010-04-23 15:43:51

Hey kowalma,

Thank you for the feedback, just some questions.

1. The number of the simultaneous calls is the same before the patch?

2. How often did asterisk crash befor the patch?

3. Did you see any change on the memory use or cpu use running asterisk with this patch?

Thank you soo much.

My two customers when i'm testing this, didn't have new problems.

By: Marcin Kowalczyk (kowalma) 2010-04-23 16:02:02

1. now is bit higer.
2. once a 3-5 days, now it's ok for ~8 days.
3. I did not notice anything suspicious on CPU/memory

By: Marcin Kowalczyk (kowalma) 2010-04-24 07:40:49

just crashed...


#0  0x080eb751 in process_events (s=0xac5d31bc) at manager.c:2644
2644                    while ( (eqe = NEW_EVENT(s)) ) {
(gdb) bt full
#0  0x080eb751 in process_events (s=0xac5d31bc) at manager.c:2644
       eqe = (struct eventqent *) 0xb1375d30
       ret = 0
#1  0x080eca32 in process_message (s=0xac5d31bc, m=0xac5d2f58) at manager.c:3030
       action = "Login", '\0' <repeats 74 times>
       ret = 0
       tmp = (struct manager_action *) 0x8844a08
       user = 0xac5d2aea "test"
       __PRETTY_FUNCTION__ = "process_message"
#2  0x080eced3 in do_message (s=0xac5d31bc) at manager.c:3127
       m = {hdrcount = 3, headers = {0xac5d2b00 "Action: Login", 0xac5d2ae0 "Username: test", 0xac5d2ac0 "Secret: testowe", 0x0 <repeats 125 times>}}
       header_buf = "\000ecret: testowe", '\0' <repeats 1009 times>
       res = 1
#3  0x080ed168 in session_do (data=0xb775f550) at manager.c:3185
       ser = (struct ast_tcptls_session_instance *) 0xb775f550
       session = (struct mansession_session *) 0x890af98
       s = {session = 0x890af98, f = 0x0, fd = 0}
       flags = 2050
       res = 0
       __PRETTY_FUNCTION__ = "session_do"
#4  0x08140997 in handle_tcptls_connection (data=0xb775f550) at tcptls.c:228
       tcptls_session = (struct ast_tcptls_session_instance *) 0xb775f550
       ssl_setup = (int (*)(SSL *)) 0x805e494 <SSL_accept@plt>
       ret = 349
       err = "\\3]?", '\0' <repeats 16 times>, "\230!?\020\000\000\000\002", '\0' <repeats 11 times>, "`3]?@`?\b\000\000\000\000\000\000\000\000?/?p\232?\000\000\000@", '\0' <repeats 20 times>, "\004\237?\000\000\000\000\0243]?\000\000\000\000\0303]?\0343]?\200\211?\f\000\000\000@`?\000?\017?\000\000\000\000`!??2]?\t??`!?\f\000\000\000\004\237?1\016\002\000?\201:\t\f\000\000\000?/?\000\000\000\000\000\000\000\000(3]?B?\024\b\001\000\000\000\f\000\000\000\0303]????`!?@`?\b8`?\b?"...
       __PRETTY_FUNCTION__ = "handle_tcptls_connection"
       cookie_funcs = {read = 0x8140228 <ssl_read>, write = 0x8140252 <ssl_write>, seek = 0, close = 0x8140273 <ssl_close>}
ASTERISK-1  0x0814d0e0 in dummy_start (data=0xb7bb10a0) at utils.c:968
       __cancel_buf = {__cancel_jmp_buf = {{__cancel_jmp_buf = {-1210437644, 0, 0, -1403178040, -1368997722, 1033986542}, __mask_was_saved = 0}}, __pad = {0xac5d3490, 0x0,
   0x888fec0, 0xb7e1a82e}}
       __cancel_routine = (void (*)(void *)) 0x807637a <ast_unregister_thread>
       __cancel_arg = (void *) 0xac5d3b90
       not_first_call = 0
       ret = (void *) 0xb7ecfd7e
       a = {start_routine = 0x8140418 <handle_tcptls_connection>, data = 0xb775f550,
 name = 0xb7bbed98 "handle_tcptls_connection started at [  275] tcptls.c ast_tcptls_server_root()"}
ASTERISK-2  0xb7d934c0 in start_thread () from /lib/i686/cmov/libpthread.so.0
No symbol table info available.
ASTERISK-3  0xb7e8a61e in clone () from /lib/i686/cmov/libc.so.6
No symbol table info available.

By: Evandro César Arruda (ecarruda) 2010-04-24 07:47:20

kowalma,

Did you have this crash after the patch? can you send to me the complete bt and bt with threads bt to my email?

Did you have this problem running manager from http or socket port? did you remember what is the actions are you calling?

It's a new problem, if you can give me remote access where have coredump to debug, if it isn't possible, just send me full bts.

ecarruda@gmail.com

Thanks



By: Marcin Kowalczyk (kowalma) 2010-04-24 08:04:40

It crashed after I applied patch.
Patched on 15.04 21:37
Manager is running on socket, we are calling actions

action=login
action=queuestatus
action=status

Unfrotunatelly machine is rechable via VPN only. I've send you BT data via email.

Thx!

By: Konrad Rozycki (krdian) 2010-04-29 04:50:21

Since I applied patch my box is not experiencing any problems with manager.

n1c1asterisk*CLI> show uptime

System uptime: 5 days, 18 hours, 33 minutes, 23 seconds

Before once 2/3 days asterisk did crash. I'm calling actions through http:

action=login
action=queuestatus
action=status

By: Evandro César Arruda (ecarruda) 2010-04-29 08:16:01

hey krdian,

Thanks for your report, kowalma having problem with console manager on events section, i working on patch upgrade too.

Please people, let me know about the new status.

krdian did you see on asterisk logs the message: memory size exceeded?



By: Konrad Rozycki (krdian) 2010-05-05 03:04:24

Hi ecarruda,

No I didn't see this message. My system uptime is:
System uptime: 1 week, 4 days, 16 hours, 53 minutes, 7 seconds

By: Evandro César Arruda (ecarruda) 2010-05-05 06:50:59

Hey Krdian,

You didnt have more crashes after the patch? All the things working fine to you?

By: Konrad Rozycki (krdian) 2010-05-05 07:34:30

Hey ecarruda,

No more crashes, everything works fine after patch.

By: Matteo (mpiazzatnetbug) 2010-05-17 06:18:56

I have uploaded a new backtrace. On this machine was running the last stable release manager.c file and not the patch provided by ecarruda.

By: Evandro César Arruda (ecarruda) 2010-05-17 07:01:53

hey mpiazzatnetbug,

how are you doing man? Are you typed wrong? because the latest version is 1.4.31, and don't 1.4.27.

Thank you.

By: Marcin Kowalczyk (kowalma) 2010-05-17 07:16:23

I had to upgrade to 1.6.2.7 (due to ODBC bug in 1.6.1 branch) and 1.6.2.7 is affected as well.

By: Evandro César Arruda (ecarruda) 2010-05-17 08:38:12

kowalma,


i don't write patch to 1.6.2 because i didn't have problem with this version, you can send to me the backtrace? I can write one patch to this version and test here.

By: Matteo (mpiazzatnetbug) 2010-05-17 08:56:10

I'm using the manager.c file of 1.4.31 version (that it's the same of 1.4.27), all the other files are of 1.4.26.3 version.

After this crash I'm going to intall the the patch on the production server.
On the test server I have no issue with the patch but the traffic it's too low to have a proper report.

By: Marcin Kowalczyk (kowalma) 2010-05-17 09:38:13

@ecarruda - I've dropped you an email with bt

By: Ove Aursand (aurs) 2010-06-01 01:58:16

I'm sorry I haven't tested this patch yet. Will install tonight (1.4.30)

By: Ove Aursand (aurs) 2010-06-01 02:31:48

manager.c:2929: warning: format ‘%d’ expects type ‘int’, but argument 6 has type ‘size_t’

Have now done make, make install and reload manager from cli. Do I need to restart asterisk? I guess I can just wait for the first crash if I do ;)

By: Evandro César Arruda (ecarruda) 2010-06-01 12:48:32

Sorry People,

I was working on a new project, now i will have time to work with yours.

kowalma, i received your e-mail, i'm working on this.

Aurs,what's the patch return this warning to you? it's my patch? are you using manager-1.4-v3.patch?

Aurs, you need to restart asterisk to apply the new make changes.

People, to me, solved all the problems, i never more see core dumps :D my customer happy, and now i need to solve yours problem, go go go, send to me reports.

The people using my patch, solved the problem two? Any one have problem yeat?

Thanks

By: Ove Aursand (aurs) 2010-06-01 13:47:38

erracuda: yes, I'm using manager-1.4-v3 and I'm using asterisk 1.4.30
edit: have now restarted and can clearly tell that the patch is active. If this patch works, the ast_verbose debugging-message (around line 2883) should be removed or moved to a higher verboselevel before going into a release :) The SIPPeers polling (that caused my crash) is back on, I'll report the result in about a week or if I get a crash before that.



By: Evandro César Arruda (ecarruda) 2010-06-01 16:07:30

aurs,

Are you using the clean asterisk-1.4.30 source? because i compiled one new source and i don't have this warning on line 2883, line 2883 is a if to me, it isn't a spritnf zone or ast_log, can you confirm it to me? send your asterisk-1.4.30/main/manager.c to my e-mail please, ecarruda@gmail.com or evandro@ezvoice.com.br.

Thanks man.

By: Ove Aursand (aurs) 2010-06-02 01:57:28

Yes, I also have a if on line 2883: if (option_verbose >1)
I changed that one to if (option_verbose >3) to reduce the output in cli.
But I got this warning when compiling:
manager.c:2929: warning: format ‘%d’ expects type ‘int’, but argument 6 has type ‘size_t’

By: Evandro César Arruda (ecarruda) 2010-06-02 08:41:26

aurs,

Thansk for your feedback, i solved the problem on line 2929 and i changed > 3 on default verbose option,  now you can apply to original manager.c file.

Thanks, and i'm waiting for new feedback :D

By: Evandro César Arruda (ecarruda) 2010-06-04 07:50:20

People,

Problem on the manager.c solved, but now i have problem on the http.c on thread, i wroten  patch to this, i will test 2 days and upload here.

Thanks guys

By: Ove Aursand (aurs) 2010-06-08 02:02:33

No crash since applying patch (1.4-v3)
# asterisk -rx "core show uptime"
System uptime: 6 days, 11 hours, 19 minutes, 19 seconds

earlier crashes:
2010-04-29T15:01:02+0200
2010-05-03T10:40:18+0200
2010-05-03T11:35:19+0200
2010-05-03T15:25:19+0200
2010-05-03T15:45:19+0200
2010-05-05T10:00:34+0200

By: Leif Madsen (lmadsen) 2010-06-08 09:54:58

ecarruda: if you have another issue and another patch, please open a new issue, and we can mark the issues as related. Lets not overload a single issue with multiple bugs and patches.

By: Evandro César Arruda (ecarruda) 2010-06-08 11:37:36

aurs,

The patch solve your problem? Everthing working fine now?

How many calls do you have?

By: Evandro César Arruda (ecarruda) 2010-06-08 11:44:03

lmadsen,

I have patch to http problem too, i will open new thread thanks man.

Please delete the file manager-1.4-v4.patch, its wrong, i will send a new file to 1.4.30 and 1.4.32 version to review.

Thansk man

By: Ove Aursand (aurs) 2010-06-14 01:40:30

ecarruda: example, on June 7th, I have 9541 cdr rows from this server. And it has not crashed after I applied the patch (the v3 patch):
# asterisk -rx "core show uptime"
System uptime: 1 week, 5 days, 11 hours, 53 seconds

By: Konrad Rozycki (krdian) 2010-06-14 02:53:25

After I applied this patch:

n1c1asterisk*CLI> core show uptime
System uptime: 7 weeks, 2 days, 16 hours, 41 minutes, 5 seconds
Last reload: 5 days, 22 hours, 23 seconds

By: Evandro César Arruda (ecarruda) 2010-06-14 06:33:53

krdian,

Thanks for your feedback, worked to you too?

Thanks man.

By: Evandro César Arruda (ecarruda) 2010-06-14 06:57:27

People,

The version -v5 is the same of v4, just have correction to patch dire/filename.

Thanks

By: Paul Belanger (pabelanger) 2010-06-14 07:37:38

manager-1.4-v4.patch deleted.

By: Evandro César Arruda (ecarruda) 2010-06-14 07:38:54

pabelanger, thanks man.

By: Miguel Molina (coolmig) 2010-06-21 13:10:22

I was load testing between two 1.6.2.9 asterisk servers and I've got the same crash:

Program terminated with signal 11, Segmentation fault.
#0  process_events (s=0xb761934c) at manager.c:2687
2687 while ( (eqe = NEW_EVENT(s)) ) {
(gdb) bt
#0  process_events (s=0xb761934c) at manager.c:2687
#1  0x080ef847 in do_message (s=0xb761934c) at manager.c:3170
#2  0x080efd17 in session_do (data=0xb7c01800) at manager.c:3235
#3  0x0815181b in dummy_start (data=0xb7c01840) at utils.c:968
#4  0x00818832 in start_thread () from /lib/libpthread.so.0
ASTERISK-1  0x001e1e0e in clone () from /lib/libc.so.6
(gdb) bt full
#0  process_events (s=0xb761934c) at manager.c:2687
       eqe = <value optimized out>
       ret = 0
#1  0x080ef847 in do_message (s=0xb761934c) at manager.c:3170
       m = {hdrcount = 0, headers = {0x0 <repeats 128 times>}}
       header_buf = '\000' <repeats 1024 times>
       res = <value optimized out>
#2  0x080efd17 in session_do (data=0xb7c01800) at manager.c:3235
       s = {session = 0x940fb38, f = 0x0, fd = 0}
       flags = <value optimized out>
       res = <value optimized out>
       __PRETTY_FUNCTION__ = "session_do"
#3  0x0815181b in dummy_start (data=0xb7c01840) at utils.c:968
       __cancel_buf = {__cancel_jmp_buf = {{__cancel_jmp_buf = {-1212147632,
               0, -1218339952, -1218341944, -1720409120, 637544596},
             __mask_was_saved = 0}}, __pad = {0xb7619480, 0x0, 0x0, 0x263ff4}}
       __cancel_arg = 0xb7619b90
       not_first_call = <value optimized out>
       ret = <value optimized out>
#4  0x00818832 in start_thread () from /lib/libpthread.so.0
No symbol table info available.
ASTERISK-1  0x001e1e0e in clone () from /lib/libc.so.6
No symbol table info available.

On this machine I originate calls to the receiving server so basically the AMI actions involved are QueueSummary and Originate. There's no HTTP involved here. Is there any 1.6.2 patch to test? I would be happy to test one.



By: Evandro César Arruda (ecarruda) 2010-06-21 13:16:35

coolmig,

I will write a patch to 1.6.2, can you send a full backtrace with DONT OPTIMIZE enable on compile?

Thanks man

By: Miguel Molina (coolmig) 2010-06-21 13:21:28

OK I'll compile without optimizations and wait for another crash of this. I'll keep you posted.

By: Andrew Latham (lathama) 2010-06-22 07:57:33

We have noticed this issue and have tested the patches in a lab.  We are reviewing a way of testing internally.  Would a "Test Suite" test be a good idea long term for the http services?

By: Paul Belanger (pabelanger) 2010-06-22 08:05:32

@lathama: Yes, any 'testsuite' test is a good idea.

By: Tilghman Lesher (tilghman) 2010-06-29 19:28:49

This is actually a much simpler problem to solve.  It's an off-by-one error.  The problem only occurs when the file size is exactly a multiple of the page size, which on most modern architectures is 4096.  In all of the backtraces here, it occurred when the file size was exactly 4096 or 8192 (in frame #1, the value of l, and in frame #0, the value of x).

Therefore, we only need to increment the mmap by 1 and ensure that we're null terminated on the last byte.



By: Evandro César Arruda (ecarruda) 2010-06-29 22:57:58

tilghman,

Yeah, something like that, i incremented to 8192, and worked.

I will apply your patch and test, give me few days.

Thanks



By: Tilghman Lesher (tilghman) 2010-07-19 17:32:44

ecarruda:  has your test completed yet?

By: Ove Aursand (aurs) 2010-07-20 04:08:02

I'm still going strong with the v3 version of the patch.
System uptime: 4 weeks, 6 days, 10 hours, 35 minutes, 11 seconds
Last reload: 1 week, 6 days, 1 hour, 8 minutes, 28 seconds
Asterisk 1.4.32

By: Digium Subversion (svnbot) 2010-07-20 11:37:17

Repository: asterisk
Revision: 278023

U   branches/1.4/main/manager.c

------------------------------------------------------------------------
r278023 | tilghman | 2010-07-20 11:37:17 -0500 (Tue, 20 Jul 2010) | 7 lines

Off-by-one error

(closes issue ASTERISK-15359)
Reported by: nik600
Patches:
      20100629__issue16506.diff.txt uploaded by tilghman (license 14)

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=278023

By: Digium Subversion (svnbot) 2010-07-20 11:50:10

Repository: asterisk
Revision: 278024

_U  trunk/
U   trunk/main/manager.c

------------------------------------------------------------------------
r278024 | tilghman | 2010-07-20 11:50:10 -0500 (Tue, 20 Jul 2010) | 14 lines

Merged revisions 278023 via svnmerge from
https://origsvn.digium.com/svn/asterisk/branches/1.4

........
 r278023 | tilghman | 2010-07-20 11:37:18 -0500 (Tue, 20 Jul 2010) | 7 lines
 
 Off-by-one error
 
 (closes issue ASTERISK-15359)
  Reported by: nik600
  Patches:
        20100629__issue16506.diff.txt uploaded by tilghman (license 14)
........

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=278024

By: Digium Subversion (svnbot) 2010-07-20 11:54:20

Repository: asterisk
Revision: 278025

_U  branches/1.6.2/
U   branches/1.6.2/main/manager.c

------------------------------------------------------------------------
r278025 | tilghman | 2010-07-20 11:54:19 -0500 (Tue, 20 Jul 2010) | 21 lines

Merged revisions 278024 via svnmerge from
https://origsvn.digium.com/svn/asterisk/trunk

................
 r278024 | tilghman | 2010-07-20 11:50:11 -0500 (Tue, 20 Jul 2010) | 14 lines
 
 Merged revisions 278023 via svnmerge from
 https://origsvn.digium.com/svn/asterisk/branches/1.4
 
 ........
   r278023 | tilghman | 2010-07-20 11:37:18 -0500 (Tue, 20 Jul 2010) | 7 lines
   
   Off-by-one error
   
   (closes issue ASTERISK-15359)
    Reported by: nik600
    Patches:
          20100629__issue16506.diff.txt uploaded by tilghman (license 14)
 ........
................

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=278025