lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis_gospodne...@yahoo.com>
Subject Re: Indexing a large number of DB records
Date Wed, 15 Dec 2004 03:05:17 GMT
Hello,

There are a few things you can do:

1) Don't just pull all rows from the DB at once.  Do that in batches.

2) If you can get a Reader from your SqlDataReader, consider this:
http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/document/Field.html#Text(java.lang.String,%20java.io.Reader)

3) Give the JVM more memory to play with by using -Xms and -Xmx JVM
parameters

4) See IndexWriter's minMergeDocs parameter.

5) Are you calling optimize() at some point by any chance?  Leave that
call for the end.

1500 documents with 30 columns of short String/number values is not a
lot.  You may be doing something else not Lucene related that's slowing
things down.

Otis


--- "Homam S.A." <homam_sa@yahoo.com> wrote:

> I'm trying to index a large number of records from the
> DB (a few millions). Each record will be stored as a
> document with about 30 fields, most of them are
> UnStored and represent small strings or numbers. No
> huge DB Text fields.
> 
> But I'm running out of memory very fast, and the
> indexing is slowing down to a crawl once I hit around
> 1500 records. The problem is each document is holding
> references to the string objects returned from
> ToString() on the DB field, and the IndexWriter is
> holding references to all these document objects in
> memory, so the garbage collector is getting a chance
> to clean these up.
> 
> How do you guys go about indexing a large DB table?
> Here's a snippet of my code (this method is called for
> each record in the DB):
> 
> private void IndexRow(SqlDataReader rdr, IndexWriter
> iw) {
> 	Document doc = new Document();
> 	for (int i = 0; i < BrowseFieldNames.Length; i++) {
> 		doc.Add(Field.UnStored(BrowseFieldNames[i],
> rdr.GetValue(i).ToString()));
> 	}
> 	iw.AddDocument(doc);
> }
> 
> 
> 
> 
> 		
> __________________________________ 
> Do you Yahoo!? 
> Yahoo! Mail - Find what you need with new enhanced search.
> http://info.mail.yahoo.com/mail_250
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message