jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jukka Zitting <jukka.zitt...@gmail.com>
Subject Re: LazyTextExtractorField and background text extraction
Date Thu, 16 Jul 2009 10:28:27 GMT
Hi,

On Thu, Jul 16, 2009 at 11:51 AM, Marcel
Reutegger<marcel.reutegger@gmx.net> wrote:
> I'm not sure I understand that correctly. with the current design
> multiple nodes are already indexed in parallel. but the index update
> as a whole will still be blocked, waiting for *all* nodes to be
> indexed.

OK, I'm just getting up to speed with the latest state of the indexing code.

If I understand correctly, we update the search index within the
transaction but if a text extraction task takes longer than the
configurable limit, that part of the index update is replaced with an
empty string and a new background task is fired to update the index
for that document once the text extraction is complete.

Would it be a problem to *always* defer text extraction to a
background task that's disconnected from the transaction? That would
make things a lot simpler at a slight loss of functionality.

Alternatively, we should probably move the extraction timeout handling
to some getExtractedText(long timeout) method that does a
wait(timeout) call on the extraction task, waiting for it to return
the extracted text as a String. If the timeout is reached, then just
an empty string is used and the rest of the extraction task is placed
in the indexing queue.

BR,

Jukka Zitting

Mime
View raw message