incubator-bloodhound-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Olemis Lang <ole...@gmail.com>
Subject [RFC] File descriptor overflow running the test suite WAS: [BEP-0003] #355 - Summary of the work done towards multi-product support (2013-02-16)
Date Tue, 19 Feb 2013 19:35:50 GMT
Hi all !

On 2/18/13, Olemis Lang <olemis@gmail.com> wrote:
>
[...]
> Up to this point there is a subset of Trac test suite that may be run
> against product environments . It consists of 794 test cases (334
> success , 30 failures , 430 errors) . The high number of failures is
> mainly a consequence of the fact that recently I've just prioritized
> TC conversion rather than making them work .

There's something going awry this side , so this is a request for comments .

Up to this point the number of reported test errors was high . The
root cause was the inefficient setUp / tearDown cycle for the test
logger : the infamous «Too many open files» system error . The
attachment:t355_r1446579_cleanup_logging.diff:ticket:355 definitely
fixes that problem which is corroborated by watching file descriptors
while running the test suite as follows .

{{{
#!sh

$ watch -n 1 "ls -l /proc/6443/fd/ | grep -v "/var/tmp" | cut -d '>'
-f 2 | sort | uniq -c"

      3  /dev/pts/9
      1  /path/to/bloodhound/trac/testing.log
      1  socket:[1041009]
      1  /tmp/trac-testing.log
      1 total 0

}}}

The number of useful file descriptors never goes beyond 6 («total 0»
line is just noise ;) . Before applying aformentioned patch the number
of file descriptors for /tmp/trac-testing.log behaved in O(n) order of
magnitude with time (n = number of instantiated runned test cases)

As you can notice [1]_ the total number of test cases in the suite
increased after the work I did yesterday . Nevertheless in a full run
the number of successful test cases has dropped and errors are much
more in number . The same does not happen if only a subset is executed
i.e. the number of test cases is relatively small. So the situation is
not as bad as it seems ;)

In the command pasted above notice that I filtered files under
/var/tmp folder , so I decided to watch for these fd(s) and determine
whether they might be causing such a trouble . So I ran this command

{{{
#!sh

$ watch -n 1 "ls -l /proc/9591/fd/ | grep '/var/tmp' | wc --lines"

}}}

... and this is what I got :

  1. While loading the test cases the number of hits (i.e. descriptors
     for files under /var/tmp) grows until about 680
  2. While running the test suite that number increases at an almost
     constant rate ranging between 2 to 4 new fd(s) all the time.
  3. When the number reaches 1018 (i.e. 1024 max - 6 fd listed above)
     every suvbsequent TC is reported as an error .

I ran a similar command once again without summary and it seems all
such files are deleted o.O

{{{

$ watch -n 1 "ls -l /proc/9591/fd/ | grep '/var/tmp' "

lrwx------ 1 user user 64 2013-02-19 14:10 10 ->
/var/tmp/etilqs_nAY7Yzw6gdPXNy1 (deleted)
[...]

}}}

Their names are always like the one shown above i.e.
/var/tmp/etilqs_<magic numbers>

Q :

  - Does anybody have a clue about what is this about ?

Thanks in advance for your comments .

-- 
Regards,

Olemis.

Blog ES: http://simelo-es.blogspot.com/
Blog EN: http://simelo-en.blogspot.com/

Featured article:

Mime
View raw message