lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis_gospodne...@yahoo.com>
Subject Re: Lucene bulk indexing
Date Thu, 21 Apr 2005 01:40:57 GMT
That sounds way too long, unless you have veeery slow disks, veeery
large Documents (long fields that you analyze, index, and store in
Lucene), or some such.
If you have very loooong fiiiiieeeelds you could try setting
http://lucene.apache.org/java/docs/api/org/apache/lucene/index/IndexWriter.html#maxFieldLength
to a very small number and see if that changes performance drastically.
 There are other IndexWriter knobs you can fiddle with.

I've seen Hibernate 2.* get sluggish once its Session gets filled up
with a lot of objects.

Otis


--- Aalap Parikh <aloo77@yahoo.com> wrote:
> Hi,
> 
> I have similar issues in indexing time.
> 
> I am doing a SELECT from database and getting back
> 10,000 rows. I then start indexing each row and hence
> would have 10,000 documents in my Lucene index. Each
> doc has 27 fields.
> 
> I added some timing code to my indexing process. The
> DB select call takes around 23 seconds and the
> indexing process takes 567 seconds. Also, I profiled
> the app using JProfiler and found out that 90% of time
> is spent in the IndexWriter.addDocument call. As
> expected, there were 10,000 invocation of that method
> (one for each doc) and the profiler showed that the
> method took 90% of the processing time.
> 
> I am concerned that it is taking around 9.5 minutes
> for 10,000 docs and I am expecting to have around
> 600,000 docs to index. So that would take 570 minutes
> (9-10 hours) to index and which is HUGE!!!
> 
> My machine: Pentium 4 CPU 2.40 GHz
>             RAM 1 GB
> 
> Any help appreciated.
> 
> Thanks,
> Aalap.
> 
> 
> --- skoptelov@fis.ru wrote:
> > В сообщении от Среда 20
> > Апрель 2005 04:07 Mufaddal Khumri
> > написал(a):
> > > The 20000 products I mentioned are 20000 rows. I
> > get the products in
> > > bulk by using a limit clause.
> > >
> > > I am using hibernate with MySQL server on a
> > 2.8GHz, 1.00GB Ram machine.
> > 
> > Maybe your session-level cache in hibernate grows
> > incredibly. Do you do 
> > Session.clear() sometimes while doing indexing?
> > Here's a link about batching 
> > & hibernate:
> >
> http://blog.hibernate.org/cgi-bin/blosxom.cgi/2004/08/
> > 
> >
> ---------------------------------------------------------------------
> > To unsubscribe, e-mail:
> > java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail:
> > java-user-help@lucene.apache.org
> > 
> > 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message