DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=28183>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND
INSERTED IN THE BUG DATABASE.
http://issues.apache.org/bugzilla/show_bug.cgi?id=28183
[Patch] replace DocumentWriter with InvertedDocument for performance
Summary: [Patch] replace DocumentWriter with InvertedDocument for
performance
Product: Lucene
Version: unspecified
Platform: All
OS/Version: All
Status: NEW
Severity: Enhancement
Priority: Other
Component: Index
AssignedTo: lucene-dev@jakarta.apache.org
ReportedBy: brian-apache@slesinsky.org
I've found a way to improve Lucene's indexing performance by about 45% for my dataset.
Here's how it works: currently the indexing process goes like this:
- use DocumentWriter to create an inverted index and serialize a one-document segment to a
RAMDirectory
- when enough documents have been read, deserialize the one-document segments in the
RAMDirectory and merge them, writing the merged segment to disk.
What I've done instead is create a new class, InvertedDocument, that keeps the inverted index
in a Map,
and can also be used directly as input for a merge. This avoids the serialization/deserialization
step,
and the RAMDirectory is no longer used when indexing.
The patch applies to the contents of CVS as of today (April 3). (It's a big patch and includes
some
minor style tweaks that aren't directly related.)
I did the performance testing using a simple application that creates an index from a file
containing
messages extracted from a bulletin board. It could index about 100 kilobytes/second with
Lucene 1.3,
and 145 kilobytes/second with the patch. This is on an 700Mhz eMac, which is pretty slow
at Java, and
the documents being indexed are, on average, less than a screenful.
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
|