lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shawn Heisey <s...@elyograg.org>
Subject Re: New Lucene features and Solr indexes
Date Wed, 13 Feb 2013 15:18:08 GMT
On 2/13/2013 2:42 AM, Adrien Grand wrote:
> Doc values are like FieldCache except that you don't need to uninvert
> values from the inverted index whenever you open a new Reader. I think
> there are two reasons why you would like to turn doc values on:

Confession -- that's almost gibberish to me!  At my current level of 
understanding, the pieces make some semblance of sense, but the whole 
thing falls apart before my head grasps it.  My fault, not yours. :)

>   - if you are indexing a field only for faceting, sorting or grouping
> (not searching), setting indexed=false and docValues=true will provide
> the same functionnality and be lighter, both at indexing time (no need
> to invert the field) and when opening a new IndexReader (no need to
> uninvert the field),

I have some fields that mostly get used for sorting.  The most common 
field used for sorting is a seconds-since-epoch timestamp simply stored 
as a long (source is MySQL bigint).  We have another copy of it in tdate 
format that we use for date range searches.  I'll need to ask whether 
they are using it for searching or filtering before I make the long 
version indexed=false.

>   - if the field is also used for searching, turning doc values on will
> give your Lucene index a little more work at indexing time (not a big
> deal in my opinion) but it will be faster to open (especially
> interesting if you're doing near-realtime search) and likely more
> memory-efficient.

I have a lot more index headroom thanks to stored/termvector 
compression.  My indexes fit entirely in available RAM now!  Even before 
the upgrade, not all of the index data was being cached, so I still had 
free RAM, so I have plenty of room for index growth.  I just have to 
convince them to start using the upgraded index copy so I can upgrade 
the other one.

> However doc values are useless for searching, so there is no need to
> turn them on on a field which is used solely for searching.
>
> Similarly to stored fields, doc values could help you retrieve the
> value of a field, but the trade-off is very different: stored fields
> are better at retrieving many fields of a single document efficiently
> while doc values are good at retrieving one field for a lot of
> documents efficiently. So if you want to get a field's value in the
> response, you should keep setting stored=true. There might be
> optimizations in the future for example if you're only asking for a
> single field which has doc values, but this will be transparent to
> you.

This suggests that adding docvalues to the uniqueKey field would be a 
good idea for distributed searching in general, since the first phase of 
a distributed search only retrieves that field and score.  That assumes 
of course that the docvalues are fully utilized for retrieving fields 
during that initial phase.

Generally when we search, we retrieve all stored fields, so I will keep 
those around.  We already don't store every field, and advances we've 
made on the client side will probably allow me to stop storing more of 
them, further reducing our index size.

Thanks,
Shawn


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message