lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aalap Parikh <alo...@yahoo.com>
Subject Re: Lucene bulk indexing
Date Thu, 21 Apr 2005 00:35:51 GMT
Hi,

I have similar issues in indexing time.

I am doing a SELECT from database and getting back
10,000 rows. I then start indexing each row and hence
would have 10,000 documents in my Lucene index. Each
doc has 27 fields.

I added some timing code to my indexing process. The
DB select call takes around 23 seconds and the
indexing process takes 567 seconds. Also, I profiled
the app using JProfiler and found out that 90% of time
is spent in the IndexWriter.addDocument call. As
expected, there were 10,000 invocation of that method
(one for each doc) and the profiler showed that the
method took 90% of the processing time.

I am concerned that it is taking around 9.5 minutes
for 10,000 docs and I am expecting to have around
600,000 docs to index. So that would take 570 minutes
(9-10 hours) to index and which is HUGE!!!

My machine: Pentium 4 CPU 2.40 GHz
            RAM 1 GB

Any help appreciated.

Thanks,
Aalap.


--- skoptelov@fis.ru wrote:
> В сообщении от Среда 20
> Апрель 2005 04:07 Mufaddal Khumri
> написал(a):
> > The 20000 products I mentioned are 20000 rows. I
> get the products in
> > bulk by using a limit clause.
> >
> > I am using hibernate with MySQL server on a
> 2.8GHz, 1.00GB Ram machine.
> 
> Maybe your session-level cache in hibernate grows
> incredibly. Do you do 
> Session.clear() sometimes while doing indexing?
> Here's a link about batching 
> & hibernate:
>
http://blog.hibernate.org/cgi-bin/blosxom.cgi/2004/08/
> 
>
---------------------------------------------------------------------
> To unsubscribe, e-mail:
> java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail:
> java-user-help@lucene.apache.org
> 
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message