lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aalap Parikh <alo...@yahoo.com>
Subject Re: Lucene bulk indexing
Date Thu, 21 Apr 2005 23:50:14 GMT
Hi,

Thanks for your suggestion. I haven't yet tried your
technique but I did try something similar by tweaking
some Indexwriter properties like mergeFactor and
minMergeDocs and it did certainly speed up the process
a lot. I am sure the same can be achieved with what
you suggest because it is essentially doing the same
thing, namely, doing more in-memory store before
writing the index out to disk.

Thanks,
Aalap.

--- "Peter A. Daly" <petedaly@gmail.com> wrote:
> On some systems I have seen big speed increases by
> indexing to a
> RAMDirectory and periodically "merging" into an on
> disk directory
> every X number of docs.  May or may not help in this
> case.  In the
> first case a used this, it took indexing down from a
> few hours to 30
> minutes for a few million documents on a windows 2k
> desktop machine.
> 
> http://www.budget-ha.com/lucene/ram-to-disk/
> 
> -Pete
> 
> On 4/20/05, Otis Gospodnetic
> <otis_gospodnetic@yahoo.com> wrote:
> > That sounds way too long, unless you have veeery
> slow disks, veeery
> > large Documents (long fields that you analyze,
> index, and store in
> > Lucene), or some such.
> > If you have very loooong fiiiiieeeelds you could
> try setting
> >
>
http://lucene.apache.org/java/docs/api/org/apache/lucene/index/IndexWriter.html#maxFieldLength
> > to a very small number and see if that changes
> performance drastically.
> >  There are other IndexWriter knobs you can fiddle
> with.
> > 
> > I've seen Hibernate 2.* get sluggish once its
> Session gets filled up
> > with a lot of objects.
> > 
> > Otis
> > 
> > 
> > --- Aalap Parikh <aloo77@yahoo.com> wrote:
> > > Hi,
> > >
> > > I have similar issues in indexing time.
> > >
> > > I am doing a SELECT from database and getting
> back
> > > 10,000 rows. I then start indexing each row and
> hence
> > > would have 10,000 documents in my Lucene index.
> Each
> > > doc has 27 fields.
> > >
> > > I added some timing code to my indexing process.
> The
> > > DB select call takes around 23 seconds and the
> > > indexing process takes 567 seconds. Also, I
> profiled
> > > the app using JProfiler and found out that 90%
> of time
> > > is spent in the IndexWriter.addDocument call. As
> > > expected, there were 10,000 invocation of that
> method
> > > (one for each doc) and the profiler showed that
> the
> > > method took 90% of the processing time.
> > >
> > > I am concerned that it is taking around 9.5
> minutes
> > > for 10,000 docs and I am expecting to have
> around
> > > 600,000 docs to index. So that would take 570
> minutes
> > > (9-10 hours) to index and which is HUGE!!!
> > >
> > > My machine: Pentium 4 CPU 2.40 GHz
> > >             RAM 1 GB
> > >
> > > Any help appreciated.
> > >
> > > Thanks,
> > > Aalap.
> > >
> > >
> > > --- skoptelov@fis.ru wrote:
> > > > В сообщении от Среда 20
> > > > Апрель 2005 04:07 Mufaddal Khumri
> > > > написал(a):
> > > > > The 20000 products I mentioned are 20000
> rows. I
> > > > get the products in
> > > > > bulk by using a limit clause.
> > > > >
> > > > > I am using hibernate with MySQL server on a
> > > > 2.8GHz, 1.00GB Ram machine.
> > > >
> > > > Maybe your session-level cache in hibernate
> grows
> > > > incredibly. Do you do
> > > > Session.clear() sometimes while doing
> indexing?
> > > > Here's a link about batching
> > > > & hibernate:
> > > >
> > >
>
http://blog.hibernate.org/cgi-bin/blosxom.cgi/2004/08/
> > > >
> > > >
> > >
>
---------------------------------------------------------------------
> > > > To unsubscribe, e-mail:
> > > > java-user-unsubscribe@lucene.apache.org
> > > > For additional commands, e-mail:
> > > > java-user-help@lucene.apache.org
> > > >
> > > >
> > >
> > >
>
---------------------------------------------------------------------
> > > To unsubscribe, e-mail:
> java-user-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail:
> java-user-help@lucene.apache.org
> > >
> > >
> > 
> >
>
---------------------------------------------------------------------
> > To unsubscribe, e-mail:
> java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail:
> java-user-help@lucene.apache.org
> > 
> >
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message