lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Beard, Brian" <Brian.Be...@mybir.com>
Subject highlighter / fragmenter performance for large fields
Date Mon, 13 Oct 2008 17:08:27 GMT
We index some documents which have an "all" field containing all of the
data which can be searched on.

One of the problems we're having is when this field is say 10Mbytes the
highlighter takes about a second to calculate the best fragments. The
search only takes 30 milliseconds. I've accomodated the load time for
the text which is about 5-10X faster in general, so 0.1-0.2 seconds for
loading text from the document, and the other 0.8-0.9 performing
highlighting.

I've over-ridden the maxDocBytesToAnalyze so it will analyze the entire
field of the document. At least at the moment we need to try and match
the entire document.

I've also tried using a SimpleAnalyzer when the highlighting is
performed, but this doesn't seem to affect performance much.

Also, I've modified the QueryScorer so it can do wildcard term matches
without extracting the terms from the index (Because we're using a
ConstantScoreQuery which doesn't let highlighting work to get around the
MaxBooleanClauses exception). Basically if the term doesn't match in the
highlighter, then it will try to pattern match against the wildcard
search terms, so there's some more processing there, but disabling it
doesn't seem to affect the performance that much.

One other thing was just doing a simple regex search without using a
scorer or analyzer. This runs about 2x faster, but still is relatively
slow.

Has anyone had any good experience with performing fragmentation and
highlighting for larger documents?

Thanks,

Brian Beard


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message