lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jack Krupansky" <j...@basetechnology.com>
Subject Re: Stored fields and OS file caching
Date Sat, 05 Apr 2014 15:11:34 GMT
Are there some graphic diagrams that illustrate this distinction in how 
stored fields and doc values are organized, including both the heap and 
non-heap aspects like file caching. Sometimes a picture is worth 1K words. 
Even if somebody could just draw it on a piece of paper and scan it.

-- Jack Krupansky

-----Original Message----- 
From: Adrien Grand
Sent: Friday, April 4, 2014 4:50 PM
To: java-user@lucene.apache.org
Subject: Re: Stored fields and OS file caching

Hi Vitaly,

Doc values are indeed well-suited for grouping and sorting. However
stored fields remain better at returning field values to users since
they guarantee a worst-case of one disk seek per document.

The filesystem cache typically caches data by blocks of 4KB. This
plays more nicely with doc values: given that they are stored in a
column-stride fashion, you are load only those field values into the
filesystem cache. On the other hand with stored fields, data is stored
sequentially in a very large file, so whenever you read a single field
value, the filesystem cache would load a 4KB block of data into the
filesystem cache that likely contains other fields' values that you
are not interested in.



On Sat, Apr 5, 2014 at 12:23 AM, Vitaly Funstein <vfunstein@gmail.com> 
wrote:
> I use stored fields to load values for the following use cases:
> - to return per-document values as is, requested by the user - similar to
> listing DB columns you are interested in, in a "select ..." clause.
> - to perform aggregate function calculations while forming the result set
> (if requested).
> - for group-by type queries (would like to switch to the native grouping
> API, but don't think it supports grouping on multiple fields, or aggregate
> functions).
> - and finally, as I mentioned - to sort search results, also when 
> requested.
>
> Evidently, even for simple queries that don't require any of the
> post-processing above but ask for a set of values from each document,
> there's still non-trivial amount of disk activity... hence, I started
> second-guessing the implementation.
>
>
> On Fri, Apr 4, 2014 at 3:00 PM, Uwe Schindler <uwe@thetaphi.de> wrote:
>
>> Hi,
>>
>> What are you doing with the stored fields? They are not deprecated and
>> also not really slow, unless you scan over millions of documents in 
>> random
>> access order. To display serach results, DocValues are of no use.
>>
>> Uwe
>>
>> -----
>> Uwe Schindler
>> H.-H.-Meier-Allee 63, D-28213 Bremen
>> http://www.thetaphi.de
>> eMail: uwe@thetaphi.de
>>
>>
>> > -----Original Message-----
>> > From: Vitaly Funstein [mailto:vfunstein@gmail.com]
>> > Sent: Friday, April 04, 2014 9:44 PM
>> > To: java-user@lucene.apache.org
>> > Subject: Stored fields and OS file caching
>> >
>> > I have heard here that stored fields don't work well with OS file
>> caching.
>> > Could someone elaborate on why that is? I am using Lucene 4.6 and we do
>> > use stored fields but not doc values; it appears most of the benefit
>> from the
>> > latter comes as improvement in sorting performance, and I don't 
>> > actually
>> use
>> > Lucene for sorting at all; rather, it's done on a post-processing 
>> > basis,
>> based on
>> > stored field values (in a nutshell, the reason for this is Lucene's
>> inability to tell
>> > apart terms that are empty strings vs. a missing value, resulting in
>> unstable
>> > sort order on such fields).
>> >
>> > I am not sure if switching to using doc values fields from stored 
>> > fields
>> entirely
>> > would help leverage OS file cache better... what worries me is that 
>> > when
>> > processing queries requesting multiple values from the document, doc
>> value
>> > fields could cause multiple disk seeks to fetch values for each field, 
>> > as
>> > opposed to just one with stored fields.
>> >
>> > Am I way off in my understanding of how this works? Any guidelines, as
>> > general as they may be, are appreciated.
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>



-- 
Adrien

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message