lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lance Norskog (JIRA)" <>
Subject [jira] Commented: (SOLR-1539) XPathEntityProcessor timeout when stream=true
Date Tue, 03 Nov 2009 04:12:59 GMT


Lance Norskog commented on SOLR-1539:

Why does it need a separate thread?

> XPathEntityProcessor timeout when stream=true
> ---------------------------------------------
>                 Key: SOLR-1539
>                 URL:
>             Project: Solr
>          Issue Type: Bug
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.4
>            Reporter: Chris Eldredge
>            Assignee: Noble Paul
>         Attachments: SOLR-1539.patch
> When setting stream=true on XPathEntityProcessor a separate thread is created to read
whatever Reader is being used for rows while the original thread pumps a BlockingQueue.  This
design allows the Reader to be read even when DIH cannot process documents as quickly as they
become available in the Reader.
> This design has questionable value.  It adds complexity to the code with unclear benefits
to the user.
> At any rate, the code incorrectly uses the BlockingQueue API:
> 1.  Arbitrarily sets a 10 second timeout and fails when this timeout elapses before a
row becomes available.
> 2.  Fails to check the return code when calling offer() to see if the item was successfully
added or if the queue is full.
> 3.  Fails to stop consuming the Reader even after an import has failed or been aborted.
> The effect is that if a URL being processed pauses more than 10 seconds to think in between
streaming rows, the XPathEntityProcessor fails.  Setting the readTimeout and connectionTimeout
attributes on the dataSource does not address this bug because XPathEntityProcessor imposes
its own timeout, hard-coded to 10 seconds.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message