lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Stoppelman <stop...@gmail.com>
Subject Re: Poor QPS with highlighting
Date Tue, 03 Feb 2009 19:40:07 GMT
On Tue, Feb 3, 2009 at 1:14 AM, mark harwood <markharw00d@yahoo.co.uk>wrote:

> >>My documents are quite big sometimes up to 300ktokens.
>
> You could look at indexing them as seperate documents using overlapping
> sections of text. Erik used this for one of his projects.
>

Can you describe this in a little more detail; I'm not exactly sure what you
mean.


> Cheers
> Mark
>
>
>
> ----- Original Message ----
> From: Michael Stoppelman <stopman@gmail.com>
> To: java-user@lucene.apache.org
> Sent: Tuesday, 3 February, 2009 7:24:06
> Subject: Poor QPS with highlighting
>
> Hi all,
>
> My search backends are only able to eek out 13-15 qps even with the entire
> index in memory (this makes it very expensive to scale). According to my
> YourKit profiler 80% of the program's time ends up in highlighting. With
> highlighting disabled my backend gets about 45-50 qps (cheaper scaling)!
> We're using Mark's TokenSources contrib. to make reconstructing of the
> document quicker. I was contemplating patching the index to store offsets
> for every term (instead of just the ordinal positions) so that I could make
> the highlighting faster (since you would know where you hit in the document
> on the search pass). I saw this thread from 2004:
> http://www.mail-archive.com/lucene-dev@jakarta.apache.org/msg04743.html -
> which asks about adding offsets to the index but it was decided against
> because it would make the index too large. I can totally understand this;
> but as machines get more beefy it would probably be nice to make this
> optional since having 15 qps vs 50qps is quite a trade-off right now. Are
> other folks seeing this? My documents are quite big sometimes up to 300k
> tokens. Also my document fields are compressed which is also a time sink
> for
> the cpu.
>
> Please let me know if you need more details, happy to share.
>
> Sincerely,
> M
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message