lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From mark harwood <>
Subject Re: Poor QPS with highlighting
Date Tue, 03 Feb 2009 09:14:24 GMT
>>My documents are quite big sometimes up to 300ktokens.

You could look at indexing them as seperate documents using overlapping sections of text.
Erik used this for one of his projects.


----- Original Message ----
From: Michael Stoppelman <>
Sent: Tuesday, 3 February, 2009 7:24:06
Subject: Poor QPS with highlighting

Hi all,

My search backends are only able to eek out 13-15 qps even with the entire
index in memory (this makes it very expensive to scale). According to my
YourKit profiler 80% of the program's time ends up in highlighting. With
highlighting disabled my backend gets about 45-50 qps (cheaper scaling)!
We're using Mark's TokenSources contrib. to make reconstructing of the
document quicker. I was contemplating patching the index to store offsets
for every term (instead of just the ordinal positions) so that I could make
the highlighting faster (since you would know where you hit in the document
on the search pass). I saw this thread from 2004: -
which asks about adding offsets to the index but it was decided against
because it would make the index too large. I can totally understand this;
but as machines get more beefy it would probably be nice to make this
optional since having 15 qps vs 50qps is quite a trade-off right now. Are
other folks seeing this? My documents are quite big sometimes up to 300k
tokens. Also my document fields are compressed which is also a time sink for
the cpu.

Please let me know if you need more details, happy to share.



To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message