lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Rutherglen <jason.rutherg...@gmail.com>
Subject Re: Poor QPS with highlighting
Date Thu, 05 Feb 2009 22:12:26 GMT
http://en.wikipedia.org/wiki/Google_platform

Document server summarization

On Thu, Feb 5, 2009 at 12:57 PM, Michael Stoppelman <stopman@gmail.com>wrote:

> On Thu, Feb 5, 2009 at 12:47 PM, Michael Stoppelman <stopman@gmail.com
> >wrote:
>
> >
> >
> > On Thu, Feb 5, 2009 at 9:05 AM, Jason Rutherglen <
> > jason.rutherglen@gmail.com> wrote:
> >
> >> Google uses dedicated highlighting servers.  Maybe this architecture
> would
> >> work for you.
> >>
> >
> > What's your reference? I used to work at Google.
> >
>
> I think creating a separate index/service would be reasonable and it's what
> I purposed in a previous email on this thread...
> "One option to get around the changing scoring would be to to run a
> completely separate index for highlighting (with the overlapping docs you
> described)."
>
> Still do lucene developers think storing the offsets is a bad idea from an
> index size prospective or some other reason?
>
> M
>
>
> >
> >> On Mon, Feb 2, 2009 at 11:24 PM, Michael Stoppelman <stopman@gmail.com
> >> >wrote:
> >>
> >> > Hi all,
> >> >
> >> > My search backends are only able to eek out 13-15 qps even with the
> >> entire
> >> > index in memory (this makes it very expensive to scale). According to
> my
> >> > YourKit profiler 80% of the program's time ends up in highlighting.
> With
> >> > highlighting disabled my backend gets about 45-50 qps (cheaper
> scaling)!
> >> > We're using Mark's TokenSources contrib. to make reconstructing of the
> >> > document quicker. I was contemplating patching the index to store
> >> offsets
> >> > for every term (instead of just the ordinal positions) so that I could
> >> make
> >> > the highlighting faster (since you would know where you hit in the
> >> document
> >> > on the search pass). I saw this thread from 2004:
> >> >
> http://www.mail-archive.com/lucene-dev@jakarta.apache.org/msg04743.html-
> >> > which asks about adding offsets to the index but it was decided
> against
> >> > because it would make the index too large. I can totally understand
> >> this;
> >> > but as machines get more beefy it would probably be nice to make this
> >> > optional since having 15 qps vs 50qps is quite a trade-off right now.
> >> Are
> >> > other folks seeing this? My documents are quite big sometimes up to
> 300k
> >> > tokens. Also my document fields are compressed which is also a time
> sink
> >> > for
> >> > the cpu.
> >> >
> >> > Please let me know if you need more details, happy to share.
> >> >
> >> > Sincerely,
> >> > M
> >> >
> >>
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message