lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Peter A. Daly" <peted...@gmail.com>
Subject Re: Lucene bulk indexing
Date Thu, 21 Apr 2005 12:20:25 GMT
On some systems I have seen big speed increases by indexing to a
RAMDirectory and periodically "merging" into an on disk directory
every X number of docs.  May or may not help in this case.  In the
first case a used this, it took indexing down from a few hours to 30
minutes for a few million documents on a windows 2k desktop machine.

http://www.budget-ha.com/lucene/ram-to-disk/

-Pete

On 4/20/05, Otis Gospodnetic <otis_gospodnetic@yahoo.com> wrote:
> That sounds way too long, unless you have veeery slow disks, veeery
> large Documents (long fields that you analyze, index, and store in
> Lucene), or some such.
> If you have very loooong fiiiiieeeelds you could try setting
> http://lucene.apache.org/java/docs/api/org/apache/lucene/index/IndexWriter.html#maxFieldLength
> to a very small number and see if that changes performance drastically.
>  There are other IndexWriter knobs you can fiddle with.
> 
> I've seen Hibernate 2.* get sluggish once its Session gets filled up
> with a lot of objects.
> 
> Otis
> 
> 
> --- Aalap Parikh <aloo77@yahoo.com> wrote:
> > Hi,
> >
> > I have similar issues in indexing time.
> >
> > I am doing a SELECT from database and getting back
> > 10,000 rows. I then start indexing each row and hence
> > would have 10,000 documents in my Lucene index. Each
> > doc has 27 fields.
> >
> > I added some timing code to my indexing process. The
> > DB select call takes around 23 seconds and the
> > indexing process takes 567 seconds. Also, I profiled
> > the app using JProfiler and found out that 90% of time
> > is spent in the IndexWriter.addDocument call. As
> > expected, there were 10,000 invocation of that method
> > (one for each doc) and the profiler showed that the
> > method took 90% of the processing time.
> >
> > I am concerned that it is taking around 9.5 minutes
> > for 10,000 docs and I am expecting to have around
> > 600,000 docs to index. So that would take 570 minutes
> > (9-10 hours) to index and which is HUGE!!!
> >
> > My machine: Pentium 4 CPU 2.40 GHz
> >             RAM 1 GB
> >
> > Any help appreciated.
> >
> > Thanks,
> > Aalap.
> >
> >
> > --- skoptelov@fis.ru wrote:
> > > В сообщении от Среда 20
> > > Апрель 2005 04:07 Mufaddal Khumri
> > > написал(a):
> > > > The 20000 products I mentioned are 20000 rows. I
> > > get the products in
> > > > bulk by using a limit clause.
> > > >
> > > > I am using hibernate with MySQL server on a
> > > 2.8GHz, 1.00GB Ram machine.
> > >
> > > Maybe your session-level cache in hibernate grows
> > > incredibly. Do you do
> > > Session.clear() sometimes while doing indexing?
> > > Here's a link about batching
> > > & hibernate:
> > >
> > http://blog.hibernate.org/cgi-bin/blosxom.cgi/2004/08/
> > >
> > >
> > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail:
> > > java-user-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail:
> > > java-user-help@lucene.apache.org
> > >
> > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
>
Mime
View raw message