lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis_gospodne...@yahoo.com>
Subject Re: fastest way to load documents
Date Sat, 02 Aug 2008 04:04:04 GMT
Increase maxBufferedDocs and don't go above 50 with mergeFactor.  What exactly the best numbers
are depends on things I can't see from here.  -Xmx5000m is good if you care for fast indexing,
but not the best choice when searching.  How fast you can index also depends on the size of
your documents and the analysis being done on them.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



----- Original Message ----
> From: Ian Connor <ian.connor@gmail.com>
> To: solr-user@lucene.apache.org
> Sent: Friday, August 1, 2008 5:08:00 PM
> Subject: Re: fastest way to load documents
> 
> I am on fedora and just running with jetty (I guess that means it will
> not just use as much RAM as I have and I need to specify it when I
> load java).
> 
> So, if I have 8GB RAM are you suggesting that I set the -Xmx 5000M or
> something large and then set merge to:
> 
> 10000
> 
> should I also increase any of these?
> 
> 
>     10000
>     2147483647
>     10000
>     1000
>     10000
> 
> and play with this to optimize?
> 
> 3000/s is my theoretical maximum. I cannot cat/grep and pass the docs
> any faster than that to curl. 100/s seems to be how fast solr can
> index at - I just want to know what to tweak to see if this can be
> increased.
> 
> On Fri, Aug 1, 2008 at 4:37 PM, Otis Gospodnetic
> wrote:
> > Configure Solr to use as much RAM as you can afford and not merge too often 
> via mergeFactor.
> > It's not clear (to me) from your explanation when you see 3000 docs/second and 
> when only 100 docs/second.
> >
> > Otis
> > --
> > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> >
> >
> >
> > ----- Original Message ----
> >> From: Ian Connor 
> >> To: solr-user@lucene.apache.org
> >> Sent: Friday, August 1, 2008 3:36:13 PM
> >> Subject: fastest way to load documents
> >>
> >> I have a number of documents in files
> >>
> >> 1.xml
> >> 2.xml
> >> ...
> >> 17M.xml
> >>
> >> I have been using cat to join them all together:
> >>
> >> cat 1.xml 2.xml ... 1000.xml  | grep -v '<\/add>' > /tmp/post.xml
> >>
> >> and posting them with curl:
> >>
> >> curl -d @/tmp/post.xml 'http://localhost:8983/solr/update' -H
> >> 'Content-Type: text/xml'
> >>
> >> Is there a faster way to load up these documents into a number of solr
> >> shards? I seem to be able to cover 3000/second just catting them
> >> together (2500 at a time is the sweet spot for me) - but this slows
> >> down to under 100/s once I try to do the post with curl.
> >>
> >> --
> >> Regards,
> >>
> >> Ian Connor
> >
> >
> 
> 
> 
> -- 
> Regards,
> 
> Ian Connor
> 82 Fellsway W #2
> Somerville, MA 02145
> Direct Line: +1 (978) 6333372
> Call Center Phone: +1 (714) 239 3875 (24 hrs)
> Mobile Phone: +1 (312) 218 3209
> Fax: +1(770) 818 5697
> Suisse Phone: +41 (0) 22 548 1664
> Skype: ian.connor


Mime
View raw message