[Home]

Summary:ASTERISK-19154: huge number of sip OPTION on 'sip reload'
Reporter:Nicolò Mazzon (nmazzon)Labels:
Date Opened:2012-01-02 05:02:53.000-0600Date Closed:2012-07-31 16:01:21
Priority:MajorRegression?No
Status:Closed/CompleteComponents:Channels/chan_sip/General
Versions:1.8.8.0 Frequency of
Occurrence
Constant
Related
Issues:
Environment:Attachments:( 0) issue19154.patch
Description:In a large installation with thousands of registered peers with qualify set to yes, "sip reload" is a command that sends a huge number of sip OPTION.
This is potentially capable of generating a high network traffic through the switch in a short time causing temporary flooding.
The outcome is a poor performance of the network and the system.

The following is a brief analysis of the problem, referring to the source code of asterisk-1.8.8.0\channels\chan_sip.c
A command of "sip reload" calls the following procedures:
{code}
sip_do_reload
reload_config(reason = CHANNEL_CLI_RELOAD)
build_peer
reg_source_db
{code}

reload_config loads each peer, users and friends (line: 28309) and looking into reg_source_db (line: 13364):
{code}
if (sipsock < 0) {
/* SIP isn't up yet, so schedule a poke only, pretty soon */
AST_SCHED_REPLACE_UNREF(peer->pokeexpire, sched, ast_random() % 5000 + 1, sip_poke_peer_s, peer,
unref_peer(_data, "removing poke peer ref"),
unref_peer(peer, "removing poke peer ref"),
ref_peer(peer, "adding poke peer ref"));
} else {
sip_poke_peer(peer, 0);
}
{code}

being sipsock positive at runtime, each peer is immediately poked.


Is there a way to avoid this problem?
Comments:By: Stefan Schmidt (schmidts) 2012-01-02 09:11:49.141-0600

i have noticed this several times but normally the problem you will have from this doesnt belong to the network.

even with 10.000 registered peers poking everyone of them would be 14 Mbit if you calculate with the MTU size or around 6.8 Mbit if you only count the real size. This shouldnt be a problem at all.

The bigger problem in this case is asterisk will not be fast enough to proceed the answer packages in time, so you will see some peers getting recognized as Delayed or even offline cause asterisk needs to long to proceed all OPTIONS responses.

imho the qualifypeers setting should be used to avoid such situations.

By: Stefan Schmidt (schmidts) 2012-01-05 03:15:55.000-0600

i have just put a patch on reviewboard for this: https://reviewboard.asterisk.org/r/1652/

By: Stefan Schmidt (schmidts) 2012-01-05 09:36:20.737-0600

same patch as on reviewboard

By: Matt Jordan (mjordan) 2012-01-23 16:23:43.802-0600

Stefan - I ack'd this issue for now to put it out of Triage, on the assumption that you'll commit your patch.  Let me know if that isn't the case.