[Home]

Summary:ASTERISK-22276: Test test_hashtab_thrash fails on 32-bit machines when compiled without DEBUG_THREADS
Reporter:Matt Jordan (mjordan)Labels:
Date Opened:2013-08-11 13:01:55Date Closed:2013-08-14 13:12:18
Priority:MajorRegression?No
Status:Closed/CompleteComponents:Tests/General
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:
Description:Upon disabling the {{DEBUG_THREADS}} and {{DETECT_DEADLOCKS}} build options, the test {{test_hashtab_thrash}} began to fail on the 32-bit CentOS 6 build agent:

[https://bamboo.asterisk.org/bamboo/browse/AST-ATRUNKUNIT-84]

Note that it does not fail on the 64-bit CentOS 6 build agent.

While debugging the test failure, the following was found:

# The test only fails when {{DEBUG_THREADS}} is disabled. The test passes with it enabled. The presence {{DETECT_DEADLOCKS}} has no effect one way or the other.
# The test fails due to a substantial slow down in execution. The test's {{MAX_TEST_SECONDS}} parameter is hit and the test cancels due to taking too long.
# Putting some debugging in here:
{noformat}
for (i = 0; i < data->max_grow; ++i) {
char *obj;
if (is_timed_out(data)) {
return "Growth timed out";
}
obj = ht_new(i);
if (obj == NULL) {
return "Allocation failed";
}
ast_hashtab_insert_immediate(data->to_be_thrashed, obj);
ast_atomic_fetchadd_int(&data->grow_count, 1);
}
{noformat}
Reveals that the call to {{ast_hashtab_insert_immediate}} increases the execution time dramatically.
# Commenting out the calls to {{ast_rwlock_wrlock}} and {{ast_rwlock_unlock}} in {{ast_hashtab_insert_immediate}} causes the test to pass.
{noformat}
// if (tab->do_locking)
// ast_rwlock_wrlock(&tab->lock);

h = (*tab->hash)(obj) % tab->hash_tab_size;

#if (defined(MALLOC_DEBUG) && !defined(STANDALONE))
res = _ast_hashtab_insert_immediate_bucket(tab, obj, h, file, lineno, func);
#else
res = ast_hashtab_insert_immediate_bucket(tab, obj, h);
#endif

// if (tab->do_locking)
// ast_rwlock_unlock(&tab->lock);
{noformat}
# Creating the hashtab object without locking also solves the performance problem; however, the test fails due to concurrency problems.

A few notes:
# I attempted some profiling, however, since gprof only works on the binary, it didn't reveal anything specific to this problem (other than calls to hashtab functions increased dramatically in time spent in the function).
## Module embedding failed due to some bugs in that compile time option; however, if those are resolved it may be possible to profile further to narrow down what exactly fails
# Commenting out all of the {{DEBUG_THREADS}} code in {{ast_rwlock_wrlock}} also reproduces the problem. So the issue does seem to be in the way in which that function is compiled without {{DEBUG_THREADS}}.

Based on this, it is highly likely that without {{DEBUG_THREADS}}, {{ast_rwlock_wrlock}} is inlined. I would guess that this is causing substantial cache misses; however, I haven't tried Oprofile yet to determine if that is the root cause. Additionally, inclusion of the {{__attribute__((noinline))}} did not seem to alter the behavior.
Comments: