lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aalap Parikh <alo...@yahoo.com>
Subject Re: Lucene bulk indexing
Date Thu, 21 Apr 2005 23:47:08 GMT
My machine is pretty good and fairly new. The disk for
sure is not slow and also I am not indexing large
Documents; 27 fields with each field value being a
string with no more than 15-20 characters long.

I tried setting the maxFieldLength value of the
Indexwriter to a low value but that didn't help.

Also, I am not using Hibernate at all.

Thanks,
Aalap.

--- Otis Gospodnetic <otis_gospodnetic@yahoo.com>
wrote:
> That sounds way too long, unless you have veeery
> slow disks, veeery
> large Documents (long fields that you analyze,
> index, and store in
> Lucene), or some such.
> If you have very loooong fiiiiieeeelds you could try
> setting
>
http://lucene.apache.org/java/docs/api/org/apache/lucene/index/IndexWriter.html#maxFieldLength
> to a very small number and see if that changes
> performance drastically.
>  There are other IndexWriter knobs you can fiddle
> with.
> 
> I've seen Hibernate 2.* get sluggish once its
> Session gets filled up
> with a lot of objects.
> 
> Otis
> 
> 
> --- Aalap Parikh <aloo77@yahoo.com> wrote:
> > Hi,
> > 
> > I have similar issues in indexing time.
> > 
> > I am doing a SELECT from database and getting back
> > 10,000 rows. I then start indexing each row and
> hence
> > would have 10,000 documents in my Lucene index.
> Each
> > doc has 27 fields.
> > 
> > I added some timing code to my indexing process.
> The
> > DB select call takes around 23 seconds and the
> > indexing process takes 567 seconds. Also, I
> profiled
> > the app using JProfiler and found out that 90% of
> time
> > is spent in the IndexWriter.addDocument call. As
> > expected, there were 10,000 invocation of that
> method
> > (one for each doc) and the profiler showed that
> the
> > method took 90% of the processing time.
> > 
> > I am concerned that it is taking around 9.5
> minutes
> > for 10,000 docs and I am expecting to have around
> > 600,000 docs to index. So that would take 570
> minutes
> > (9-10 hours) to index and which is HUGE!!!
> > 
> > My machine: Pentium 4 CPU 2.40 GHz
> >             RAM 1 GB
> > 
> > Any help appreciated.
> > 
> > Thanks,
> > Aalap.
> > 
> > 
> > --- skoptelov@fis.ru wrote:
> > > В сообщении от Среда 20
> > > Апрель 2005 04:07 Mufaddal Khumri
> > > написал(a):
> > > > The 20000 products I mentioned are 20000 rows.
> I
> > > get the products in
> > > > bulk by using a limit clause.
> > > >
> > > > I am using hibernate with MySQL server on a
> > > 2.8GHz, 1.00GB Ram machine.
> > > 
> > > Maybe your session-level cache in hibernate
> grows
> > > incredibly. Do you do 
> > > Session.clear() sometimes while doing indexing?
> > > Here's a link about batching 
> > > & hibernate:
> > >
> >
>
http://blog.hibernate.org/cgi-bin/blosxom.cgi/2004/08/
> > > 
> > >
> >
>
---------------------------------------------------------------------
> > > To unsubscribe, e-mail:
> > > java-user-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail:
> > > java-user-help@lucene.apache.org
> > > 
> > > 
> > 
> >
>
---------------------------------------------------------------------
> > To unsubscribe, e-mail:
> java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail:
> java-user-help@lucene.apache.org
> > 
> > 
> 
>
---------------------------------------------------------------------
> To unsubscribe, e-mail:
> java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail:
> java-user-help@lucene.apache.org
> 
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message