jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jukka Zitting <jukka.zitt...@gmail.com>
Subject Re: LazyTextExtractorField and background text extraction
Date Thu, 16 Jul 2009 10:28:27 GMT

On Thu, Jul 16, 2009 at 11:51 AM, Marcel
Reutegger<marcel.reutegger@gmx.net> wrote:
> I'm not sure I understand that correctly. with the current design
> multiple nodes are already indexed in parallel. but the index update
> as a whole will still be blocked, waiting for *all* nodes to be
> indexed.

OK, I'm just getting up to speed with the latest state of the indexing code.

If I understand correctly, we update the search index within the
transaction but if a text extraction task takes longer than the
configurable limit, that part of the index update is replaced with an
empty string and a new background task is fired to update the index
for that document once the text extraction is complete.

Would it be a problem to *always* defer text extraction to a
background task that's disconnected from the transaction? That would
make things a lot simpler at a slight loss of functionality.

Alternatively, we should probably move the extraction timeout handling
to some getExtractedText(long timeout) method that does a
wait(timeout) call on the extraction task, waiting for it to return
the extracted text as a String. If the timeout is reached, then just
an empty string is used and the rest of the extraction task is placed
in the indexing queue.


Jukka Zitting

View raw message