Details

    • Type: Bug Bug
    • Status: Closed
    • Severity: Critical Critical
    • Resolution: Cannot Reproduce
    • Affects Version/s: None
    • Target Release Version/s: None
    • Component/s: Core/General
    • Labels:
      None
    • Mantis ID:
      1124
    • Regression:
      No

      Description

      I had my first production system Asterisk crash today with no apparent
      reason for the crash. This was on a production server that hasn't had
      anything changed on it for 3 weeks and is rebooted every night. The load was
      low when the crash occured and the logs give no indications as to what
      caused it. This server usually goes through about 5000 calls in a day with
      no problems.

      I ran a gdb on my 182MB core file and have posted the gdb output.

      It seems to be an Invalid Memory reference in strlen () from /lib/i686/libc.so.6

      After looking into this it seems like this can be caused by all sorts of reasons like referencing an array element that is way outside of an array range.

      The Asterisk server has CVS-02/09/04-10:44:07 loaded on RedHat 9.0, libc has not been updated recently on the machine.

      I would appreciate any help as I have never debugged this kind of crash before.

        Activity

        Hide
        Brian West added a comment -

        you need to type 'bt' once you have gdb loaded up to get the useful info.

        Show
        Brian West added a comment - you need to type 'bt' once you have gdb loaded up to get the useful info.
        Hide
        Matt Florell added a comment -

        Sorry about that, here's the bt output:

        #0 0x4015d1b3 in strlen () from /lib/i686/libc.so.6
        #1 0x40153696 in _IO_str_init_static_internal () from /lib/i686/libc.so.6
        #2 0x40147b97 in vsscanf () from /lib/i686/libc.so.6
        #3 0x401440ad in sscanf () from /lib/i686/libc.so.6
        #4 0x08078a79 in ast_ouraddrfor (them=0x0, us=0x4b1785a0) at acl.c:308
        ASTERISK-1 0x42cf902f in ast_sip_ouraddrfor (them=0x4352eb1c, us=0x0) at chan_sip.c:475
        ASTERISK-2 0x42cf82d7 in sip_poke_peer (peer=0x810b2d0) at chan_sip.c:5873
        ASTERISK-3 0x42d0569c in sip_poke_peer_s (data=0x0) at chan_sip.c:4812
        ASTERISK-4 0x080521f9 in ast_sched_runq (con=0x80f7388) at sched.c:376
        ASTERISK-5 0x42d01af3 in do_monitor (data=0x0) at chan_sip.c:5770
        ASTERISK-6 0x400269b1 in pthread_start_thread () from /lib/i686/libpthread.so.0

        Show
        Matt Florell added a comment - Sorry about that, here's the bt output: #0 0x4015d1b3 in strlen () from /lib/i686/libc.so.6 #1 0x40153696 in _IO_str_init_static_internal () from /lib/i686/libc.so.6 #2 0x40147b97 in vsscanf () from /lib/i686/libc.so.6 #3 0x401440ad in sscanf () from /lib/i686/libc.so.6 #4 0x08078a79 in ast_ouraddrfor (them=0x0, us=0x4b1785a0) at acl.c:308 ASTERISK-1 0x42cf902f in ast_sip_ouraddrfor (them=0x4352eb1c, us=0x0) at chan_sip.c:475 ASTERISK-2 0x42cf82d7 in sip_poke_peer (peer=0x810b2d0) at chan_sip.c:5873 ASTERISK-3 0x42d0569c in sip_poke_peer_s (data=0x0) at chan_sip.c:4812 ASTERISK-4 0x080521f9 in ast_sched_runq (con=0x80f7388) at sched.c:376 ASTERISK-5 0x42d01af3 in do_monitor (data=0x0) at chan_sip.c:5770 ASTERISK-6 0x400269b1 in pthread_start_thread () from /lib/i686/libpthread.so.0
        Hide
        James Golovich added a comment -

        If I'm reading the code right then we should be checking if 'them' is NULL in ast_ouraddrfor. Since ast_ouraddrfor is called to lookup which address we should use as the source for packets to the destination address contained in 'them'.

        Show
        James Golovich added a comment - If I'm reading the code right then we should be checking if 'them' is NULL in ast_ouraddrfor. Since ast_ouraddrfor is called to lookup which address we should use as the source for packets to the destination address contained in 'them'.
        Hide
        Mark Spencer added a comment -

        Unfortunately, looking at actual code seems to suggest the addresses can never be NULL because they are passed as references on the stack. Further, notice that #4 and ASTERISK-1 are not in agreement. Perhaps there is some sort of stack corruption? I can login to your box and look, but your code is a bit dated.

        Show
        Mark Spencer added a comment - Unfortunately, looking at actual code seems to suggest the addresses can never be NULL because they are passed as references on the stack. Further, notice that #4 and ASTERISK-1 are not in agreement. Perhaps there is some sort of stack corruption? I can login to your box and look, but your code is a bit dated.
        Hide
        Matt Florell added a comment -

        I've updated to CVS 2004-03-01, and we'll see if this happens again. It took 3 weeks before this happened so I have no idea how long it will take to happen again, if it does. I'll update CVS every week just to make sure I have a somewhat recent version on this machine.

        Show
        Matt Florell added a comment - I've updated to CVS 2004-03-01, and we'll see if this happens again. It took 3 weeks before this happened so I have no idea how long it will take to happen again, if it does. I'll update CVS every week just to make sure I have a somewhat recent version on this machine.
        Hide
        Mark Spencer added a comment -

        I'm going to resolve this as unable to duplicate, but if the problem happens again, just open it back up

        Show
        Mark Spencer added a comment - I'm going to resolve this as unable to duplicate, but if the problem happens again, just open it back up

          People

          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development