lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Toke Eskildsen ...@statsbiblioteket.dk>
Subject Re: Solr maximum Optimal Index Size per Shard
Date Fri, 06 Jun 2014 12:25:34 GMT
On Fri, 2014-06-06 at 14:05 +0200, Vineet Mishra wrote:

> Could you state what indexing mechanism are you using, as I started
> with EmbeddedSolrServer but it was pretty slow after a few GB(~30+) of
> indexing.

I suspect that is due to too-frequent commits, too small heap or
something third, unrelated to EmbeddedSolrServer itself. Underneath the
surface it is just the same as a standalone Solr.

We're building our ~1TB indexes individually, using standalone workers
for the heavy part of the analysis (Tika). The delivery from the workers
to the Solr server is over the network, using the Solr binary protocol.
My colleague Thomas Egense just created a small write-up at
https://github.com/netarchivesuite/netsearch

>  I started indexing 1 week back and still its 37GB, although I assume
> HttpPost mechanism will perform lethargic slow due to network latency
> and for the response await.

Maybe if you send the documents one at a time, but if you bundle them in
larger updates, the post-method should be fine.

- Toke Eskildsen, State and University Library, Denmark



Mime
View raw message