lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andreas Guther" <andreas.gut...@gmail.com>
Subject Re: Field.Store.Compress - does it improve performance of document reads?
Date Sun, 20 May 2007 17:36:00 GMT
Thank you for the clarification.

I assume that hits usually return in ranking order which makes sense in
terms how one usually wants to display the result.  In terms of access speed
this is the non wanted order.  Though it is not a big deal sorting the array
it might be interesting thinking about providing search calls with a
parameter defining the hits order allowing to have Lucene returning the
result in document id order if needed.

Andreas


On 5/18/07, Paul Elschot <paul.elschot@xs4all.nl> wrote:
>
> Otis,
>
> See below.
>
> On Friday 18 May 2007 05:03, Otis Gospodnetic wrote:
> > ----- Original Message ----
> > From: Paul Elschot <paul.elschot@xs4all.nl>
> >
> > On Thursday 17 May 2007 08:10, Andreas Guther wrote:
> > > I am currently exploring how to solve performance problems I encounter
> with
> > > Lucene document reads.
> > >
> > > We have amongst other fields one field (default) storing all
> searchable
> > > fields.  This field can become of considerable size since we
> are  indexing
> > > documents and  store the content for display within results.
> > >
> > > I noticed that the read can be very expensive.  I wonder now if it
> would
> > > make sense to add this field as Field.Store.Compress to the
> index.  Can
> > > someone tell me if this would speed up the document read or if this is
> > > something only interesting for saving space.
> >
> > I have not tried the compression yet, but in my experience a good way
> > to reduce the costs of document reads from a disk is by reading them
> > in document number order whenever possible. In this way one saves
> > on the disk head seeks.
> > Compression should actually help reducing the costs of disk head seeks
> > even more.
> >
> > OG: Does this really help in a multi-user environment where there are
> multiple parallel queries hitting the index and reading data from all over
> the index and the disk?  They will all share the same disk head, so the
> head
> will still have to jump around to service all these requests, even if each
> request is being careful to read documents in docId order, no?
>
> When the requests for reading docs come in parallel, the underlying OS
> will normally favour the request for which the disk head needs to move
> only very little. In those cases reading in doc id order also helps
> throughput by allowing the OS to merge the requests for which the
> head then needs to move even less per retrieved doc.
> For example, with 3 retrieved docs per query:
> 10 1 5 sorts into: 1 5 10.
> 7 12 3 sorts into: 3 7 12.
> 1 5 10 and 3 7 12 merge into: 1 3 5 7 10 12, total seek distance: 11.
> Unsorted these could merge into: 7 10 12 3 1 5, total seek distance: 20.
>
> Iirc Hits does not retrieve docs in doc id order.
>
> Regards,
> Paul Elschot
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message