lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From kumarlimbu <>
Subject Is storing 20 fields in a lucene document desirable?
Date Tue, 20 Nov 2007 11:29:44 GMT

Our document contains a total of 23 fields in one document and we STORE all
of them in lucene index. 

We have recently had some performance issues and our analysis has shown the
bottleneck to be lucene search and retrieval.

We have been thinking about reducing the number of fields per document by
removing unnecessary fields and by merging fields with similar weightings.
Will reducing the number of fields help to optimize performance?

Another issue is we are currently retrieving around 9 fields after we do a
search. Some are long text of up to 1000 words. Is it a large overhead to
retrieve long fields? 

We are considering the option of separating the search and retrieve parts so
that Lucene performs the search, MYSQL stores the data. We just store the
INDEXED field and primary key in the lucene index. After searching we only
return 1 field (primary key) instead of 9 fields. This field will be used
for retrieving the actual information from the MySQL database. Will reducing
the number of fields retrieved from lucene reduce the response time or will
using MySQL database make it worse? 

So our main concern is to find if retrieving fields usually takes longer
than searching or not? What does lucene spend most time doing – search or
retrieval? We are also concerned that using MySQL will have performance
issues because we will be doing I/O for MySQL as well as Lucene. We also add
around 100k documents each day and remove around the same number of
documents. Will this frequent read and write have impact on performance?

Our current index size is around 8G and contains around 3M documents.

If there are any suggestions or questions, please reply.

Thanks you all!


View this message in context:
Sent from the Lucene - Java Users mailing list archive at

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message