lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel Herlitz <dan...@dhlz.com>
Subject Re: Lucene bulk indexing
Date Tue, 19 Apr 2005 21:16:12 GMT
Agree. We run an index with about 2.5 million documents and around 30 
fields. The indexing itself of 20000 items should only take a few 
seconds on a reasonably fast machine.

/D

Kevin L. Cobb wrote:

>I think your bottleneck is most likely the DB hit. I assume by 20000
>products you mean 20000 distinct entries into the Lucene Index, i.e.
>20000 rows in the DB to select from. 
>
>I index about 1.5 million rows from a SQL Server 2000 database with
>several fields for each entry and it finishes in about twenty minutes so
>you should be able to index 20000 rows in a few seconds. 
>
>Make sure your database table(s) are indexed appropriately according to
>your select statements. Indexing correctly will be the biggest
>performance improvement you will see. 
>
>Best of luck. 
>
>KLCobb
>
>-----Original Message-----
>From: Mufaddal Khumri [mailto:MKhumri@allegromedical.com] 
>Sent: Tuesday, April 19, 2005 2:11 PM
>To: java-user@lucene.apache.org
>Subject: Lucene bulk indexing
>
>Hi,
>
>I am sure this question must be raised before and maybe it has been even
>answered. I would be grateful, if someone could point me in the right
>direction or give their thoughts on this topic.
>
>The problem:
>
>I have approximately over 20000 products that I need to index. At the
>moment I get X number of products at a time and index them. This process
>takes about 26 minutes (Am indexing the database id, product name,
>product description).
>
>I was thinking of ways to make this indexing faster. For this I was
>thinking about writing a threaded module that would index X number of
>products simultaneously. For instance I could spawn (Number of
>products/X) number of threads and do the indexing. I am guessing this
>would be faster but by what factor would this be faster? (I understand
>the writes to the index are synchronized by lucene).
>
>Is there any other approach by which I could speed up the indexing?
>Thoughts? Suggestions?
>
>Thanks,
>Mufaddal.
>
>
>------------------------------------------------------------------------
>------------------
>This email and any files transmitted with it are confidential 
>and intended solely for the use of the individual or entity 
>to whom they are addressed. If you have received this 
>email in error please notify the system manager. Please
>note that any views or opinions presented in this email 
>are solely those of the author and do not necessarily
>represent those of the company. Finally, the recipient
>should check this email and any attachments for the 
>presence of viruses. The company accepts no liability for
>any damage caused by any virus transmitted by this email.
>Consult your physician prior to the use of any medical
>supplies or product.
>------------------------------------------------------------------------
>------------------
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>For additional commands, e-mail: java-user-help@lucene.apache.org
>
>  
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message