jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marcel Reutegger <marcel.reuteg...@gmx.net>
Subject Re: Remove pooling for text extraction
Date Fri, 12 Sep 2008 12:48:04 GMT
Hi,

Jukka Zitting wrote:
> Hi,
> 
> In JCR-390 we added support for text extraction in background threads.
> This was done with the PooledTextExtractor class that maintains a pool
> of threads for this purpose. Do we need that pool, or could we simply
> just start a new thread for each new extraction task? That would
> simplify the indexing code.

that would probably simply the code, however I consider the current code not
that complicated that it requires simplification. what you would loose is the
ability to limit the number of concurrent text extraction tasks. I think this is
a major drawback.

> The time to start a new thread is probably minimal compared to that of
> parsing a document.

I agree.

> And when you're parsing a lot of large documents,
> much of the time is spent waiting for IO so the more concurrent
> threads you have the better throughput you get.

my experience is the other way around. the libraries we are using to extract
text are rather CPU intensive. IO is rarely the limiting factor.

I prefer to keep the current implementation.

regards
 marcel

Mime
View raw message