lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Johannes.Schwendin...@blum.com
Subject Antwort: Re: Solr Cell Questions
Date Tue, 25 Sep 2012 09:23:26 GMT
Thank you Erick for your respone,

I've already tried what you've suggested and got some out of memory 
exceptions. Because of this i like the solution with solr Cell where i can 
send the file directly to solr via stream and don't collect them in my 
memory. 

And another question that came to my mind, how many documents per minute, 
second, what ever can i put into solr. Say XML format and from 100kb to 
100MB. 
Is there a number or is it to dependent from hardware and settings?


Best
Johannes

Erick Erickson <erickerickson@gmail.com> schrieb am 25.09.2012 00:22:26:

> Von:
> 
> Erick Erickson <erickerickson@gmail.com>
> 
> An:
> 
> solr-user@lucene.apache.org
> 
> Datum:
> 
> 25.09.2012 00:23
> 
> Betreff:
> 
> Re: Solr Cell Questions
> 
> If you're concerned about throughput, consider moving all the
> SolrCell (Tika) processing off the server. SolrCell is way cool
> for showing what can be done, but its downside is you're
> moving all the processing of the structured documents to the
> same machine doing the indexing. Pretty soon, especially
> with significant size files, you're spending all your CPU cycles
> parsing the files...
> 
> Happens there's a blog about this:
> http://searchhub.org/dev/2012/02/14/indexing-with-solrj/
> 
> By moving the indexing to N clients, you can increase
> throughput until you make Solr work hard to do the indexing....
> 
> Best
> Erick
> 
> On Mon, Sep 24, 2012 at 10:04 AM,  <Johannes.Schwendinger@blum.com> 
wrote:
> > Hi,
> >
> > Im currently experimenting with Solr Cell to index files to Solr. 
During
> > this some questions came up.
> >
> > 1. Is it possible (and wise) to connect to Solr Cell with multiple 
Threads
> > at the same time to index several documents at the same time?
> > This question came up because my prrogramm takes about 6hours to index
> > round 35000 docs. (no production environment, only example solr and a
> > little desktop machine but I think its very slow, and I know solr 
isn't
> > the bottleneck (yet))
> >
> > 2. If 1 is possible, how many Threads should do this and how many 
memory
> > Solr needs? I've tried it but i run into an out of memory exception.
> >
> > Thanks in advantage
> >
> > Best Regards
> > Johannes

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message