lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vitaly Funstein <vfunst...@gmail.com>
Subject Re: Stored fields and OS file caching
Date Fri, 04 Apr 2014 22:23:32 GMT
I use stored fields to load values for the following use cases:
- to return per-document values as is, requested by the user - similar to
listing DB columns you are interested in, in a "select ..." clause.
- to perform aggregate function calculations while forming the result set
(if requested).
- for group-by type queries (would like to switch to the native grouping
API, but don't think it supports grouping on multiple fields, or aggregate
functions).
- and finally, as I mentioned - to sort search results, also when requested.

Evidently, even for simple queries that don't require any of the
post-processing above but ask for a set of values from each document,
there's still non-trivial amount of disk activity... hence, I started
second-guessing the implementation.


On Fri, Apr 4, 2014 at 3:00 PM, Uwe Schindler <uwe@thetaphi.de> wrote:

> Hi,
>
> What are you doing with the stored fields? They are not deprecated and
> also not really slow, unless you scan over millions of documents in random
> access order. To display serach results, DocValues are of no use.
>
> Uwe
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
>
> > -----Original Message-----
> > From: Vitaly Funstein [mailto:vfunstein@gmail.com]
> > Sent: Friday, April 04, 2014 9:44 PM
> > To: java-user@lucene.apache.org
> > Subject: Stored fields and OS file caching
> >
> > I have heard here that stored fields don't work well with OS file
> caching.
> > Could someone elaborate on why that is? I am using Lucene 4.6 and we do
> > use stored fields but not doc values; it appears most of the benefit
> from the
> > latter comes as improvement in sorting performance, and I don't actually
> use
> > Lucene for sorting at all; rather, it's done on a post-processing basis,
> based on
> > stored field values (in a nutshell, the reason for this is Lucene's
> inability to tell
> > apart terms that are empty strings vs. a missing value, resulting in
> unstable
> > sort order on such fields).
> >
> > I am not sure if switching to using doc values fields from stored fields
> entirely
> > would help leverage OS file cache better... what worries me is that when
> > processing queries requesting multiple values from the document, doc
> value
> > fields could cause multiple disk seeks to fetch values for each field, as
> > opposed to just one with stored fields.
> >
> > Am I way off in my understanding of how this works? Any guidelines, as
> > general as they may be, are appreciated.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message