lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jack Krupansky" <j...@basetechnology.com>
Subject Re: Solr Cell Questions
Date Tue, 25 Sep 2012 14:36:06 GMT
Maybe we should even contemplate direct support for Tika/SolrCell in SolrJ - 
call it SolrJCell. This might also make it a lot easier for apps to apply 
post-processing after document parser but before data is sent to Solr.

And maybe even have an option for multi-process support (invoke Tika as a 
separate process) to minimize thread issues, GC issues, hung parsers, etc.

-- Jack Krupansky

-----Original Message----- 
From: Alexandre Rafalovitch
Sent: Tuesday, September 25, 2012 10:24 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr Cell Questions

Are you by any chance committing after every file being indexed? That
could cause the speed issues.

Also, have you tried to optimize your indexer's java memory params. I
use this for mine which used to run out of memory as well:
java -server -Xms512m -Xmx2048m

Regards,
   Alex.
P.s. I may have some issues with mine still, so this is just a
direction hint, not a full answer.
P.p.s. I have not tried this, but you may be able to run multiple
Tikas in parallel queues/processes and then feed that into
single-queue to send to Solr.

Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


On Mon, Sep 24, 2012 at 10:04 AM,  <Johannes.Schwendinger@blum.com> wrote:
> Hi,
>
> Im currently experimenting with Solr Cell to index files to Solr. During
> this some questions came up.
>
> 1. Is it possible (and wise) to connect to Solr Cell with multiple Threads
> at the same time to index several documents at the same time?
> This question came up because my prrogramm takes about 6hours to index
> round 35000 docs. (no production environment, only example solr and a
> little desktop machine but I think its very slow, and I know solr isn't
> the bottleneck (yet))
>
> 2. If 1 is possible, how many Threads should do this and how many memory
> Solr needs? I've tried it but i run into an out of memory exception.
>
> Thanks in advantage
>
> Best Regards
> Johannes 


Mime
View raw message