lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Garrett Heaver" <>
Subject RE: Indexing a large number of DB records
Date Wed, 15 Dec 2004 17:44:41 GMT
Hi Homan

I had a similar problem as you in that I was indexing A LOT of data

Essentially how I got round it was to batch the index.

What I was doing was to add 10,000 documents to a temporary index, use
addIndexes() to merge to temporary index into the live index (which also
optimizes the live index) then delete the temporary index. On the next loop
I'd only query rows from the db above the id in the maxdoc of the live index
and set the max rows of the query to to 10,000

SELECT TOP 10000 [fields] FROM [tables] WHERE [id_field] > {ID from
Index.MaxDoc()} ORDER BY [id_field] ASC

Ensuring that the documents go into the index sequentially your problem is
solved and memory usage on mine (dotlucene 1.3) is low


-----Original Message-----
From: Homam S.A. [] 
Sent: 15 December 2004 02:43
To: Lucene Users List
Subject: Indexing a large number of DB records

I'm trying to index a large number of records from the
DB (a few millions). Each record will be stored as a
document with about 30 fields, most of them are
UnStored and represent small strings or numbers. No
huge DB Text fields.

But I'm running out of memory very fast, and the
indexing is slowing down to a crawl once I hit around
1500 records. The problem is each document is holding
references to the string objects returned from
ToString() on the DB field, and the IndexWriter is
holding references to all these document objects in
memory, so the garbage collector is getting a chance
to clean these up.

How do you guys go about indexing a large DB table?
Here's a snippet of my code (this method is called for
each record in the DB):

private void IndexRow(SqlDataReader rdr, IndexWriter
iw) {
	Document doc = new Document();
	for (int i = 0; i < BrowseFieldNames.Length; i++) {

Do you Yahoo!? 
Yahoo! Mail - Find what you need with new enhanced search.

To unsubscribe, e-mail:
For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message