lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson" <erickerick...@gmail.com>
Subject Re: Field.Store.Compress - does it improve performance of document reads?
Date Mon, 21 May 2007 00:50:05 GMT
Have you tried the static Sort.INDEXORDER sort object in Lucene 2.1?

Erick

On 5/20/07, Andreas Guther <andreas.guther@gmail.com> wrote:
>
> Thank you for the clarification.
>
> I assume that hits usually return in ranking order which makes sense in
> terms how one usually wants to display the result.  In terms of access
> speed
> this is the non wanted order.  Though it is not a big deal sorting the
> array
> it might be interesting thinking about providing search calls with a
> parameter defining the hits order allowing to have Lucene returning the
> result in document id order if needed.
>
> Andreas
>
>
> On 5/18/07, Paul Elschot <paul.elschot@xs4all.nl> wrote:
> >
> > Otis,
> >
> > See below.
> >
> > On Friday 18 May 2007 05:03, Otis Gospodnetic wrote:
> > > ----- Original Message ----
> > > From: Paul Elschot <paul.elschot@xs4all.nl>
> > >
> > > On Thursday 17 May 2007 08:10, Andreas Guther wrote:
> > > > I am currently exploring how to solve performance problems I
> encounter
> > with
> > > > Lucene document reads.
> > > >
> > > > We have amongst other fields one field (default) storing all
> > searchable
> > > > fields.  This field can become of considerable size since we
> > are  indexing
> > > > documents and  store the content for display within results.
> > > >
> > > > I noticed that the read can be very expensive.  I wonder now if it
> > would
> > > > make sense to add this field as Field.Store.Compress to the
> > index.  Can
> > > > someone tell me if this would speed up the document read or if this
> is
> > > > something only interesting for saving space.
> > >
> > > I have not tried the compression yet, but in my experience a good way
> > > to reduce the costs of document reads from a disk is by reading them
> > > in document number order whenever possible. In this way one saves
> > > on the disk head seeks.
> > > Compression should actually help reducing the costs of disk head seeks
> > > even more.
> > >
> > > OG: Does this really help in a multi-user environment where there are
> > multiple parallel queries hitting the index and reading data from all
> over
> > the index and the disk?  They will all share the same disk head, so the
> > head
> > will still have to jump around to service all these requests, even if
> each
> > request is being careful to read documents in docId order, no?
> >
> > When the requests for reading docs come in parallel, the underlying OS
> > will normally favour the request for which the disk head needs to move
> > only very little. In those cases reading in doc id order also helps
> > throughput by allowing the OS to merge the requests for which the
> > head then needs to move even less per retrieved doc.
> > For example, with 3 retrieved docs per query:
> > 10 1 5 sorts into: 1 5 10.
> > 7 12 3 sorts into: 3 7 12.
> > 1 5 10 and 3 7 12 merge into: 1 3 5 7 10 12, total seek distance: 11.
> > Unsorted these could merge into: 7 10 12 3 1 5, total seek distance: 20.
> >
> > Iirc Hits does not retrieve docs in doc id order.
> >
> > Regards,
> > Paul Elschot
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message