lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shawn Heisey <s...@elyograg.org>
Subject Re: Bulk indexing data into solr
Date Thu, 26 Jul 2012 15:46:37 GMT
On 7/26/2012 7:34 AM, Rafał Kuć wrote:
> If you use Java (and I think you do, because you mention Lucene) you
> should take a look at StreamingUpdateSolrServer. It not only allows
> you to send data in batches, but also index using multiple threads.

A caveat to what Rafał said:

The streaming object has no error detection out of the box.  It queues 
everything up internally and returns immediately.  Behind the scenes, it 
uses multiple threads to send documents to Solr, but any errors 
encountered are simply sent to the logging mechanism, then ignored.  
When you use HttpSolrServer, all errors encountered will throw 
exceptions, but you have to wait for completion.  If you need both 
concurrent capability and error detection, you would have to manage 
multiple indexing threads yourself.

Apparently there is a method in the concurrent class that you can 
override and handle errors differently, though I have not seen how to 
write code so your program would know that an error occurred.  I filed 
an issue with a patch to solve this, but some of the developers have 
come up with an idea that might be better.  None of the ideas have been 
committed to the project.

https://issues.apache.org/jira/browse/SOLR-3284

Just an FYI, the streaming class was renamed to 
ConcurrentUpdateSolrServer in Solr 4.0 Alpha.  Both are available in 3.6.x.

Thanks,
Shawn


Mime
View raw message