lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adrien Grand <jpou...@gmail.com>
Subject Re: New Lucene features and Solr indexes
Date Wed, 13 Feb 2013 09:42:49 GMT
Hi Shawn,

On Tue, Feb 12, 2013 at 8:58 PM, Shawn Heisey <solr@elyograg.org> wrote:
> Some of these, like compressed stored fields and compressed termvectors, are
> being turned on by default, which is awesome.  I'm already running a 4.2
> snapshot, so I've got those in place.

Excellent!

> One thing that I know I would like to do is use the new BloomFilter for a
> couple of my fields that contain only unique values.  Last time I checked
> (which was before the 4.1 release), if you added the lucene-codecs jar, Solr
> had a BloomFilter postings format, but didn't have any way to specify the
> underlying format.  See SOLR-3950 and LUCENE-4394.

BloomFilterPostingsFormat is a little special compared to other
postings formats because it can wrap any postings format. So maybe it
should require special support, like an additional attribute in the
field type definition?

> Another new feature that is coming soon to Solr is DocValues - SOLR-3855.
> Looking at the issue, I was not able to tell what situations would be
> appropriate for using the feature.

Doc values are like FieldCache except that you don't need to uninvert
values from the inverted index whenever you open a new Reader. I think
there are two reasons why you would like to turn doc values on:
 - if you are indexing a field only for faceting, sorting or grouping
(not searching), setting indexed=false and docValues=true will provide
the same functionnality and be lighter, both at indexing time (no need
to invert the field) and when opening a new IndexReader (no need to
uninvert the field),
 - if the field is also used for searching, turning doc values on will
give your Lucene index a little more work at indexing time (not a big
deal in my opinion) but it will be faster to open (especially
interesting if you're doing near-realtime search) and likely more
memory-efficient.

However doc values are useless for searching, so there is no need to
turn them on on a field which is used solely for searching.

Similarly to stored fields, doc values could help you retrieve the
value of a field, but the trade-off is very different: stored fields
are better at retrieving many fields of a single document efficiently
while doc values are good at retrieving one field for a lot of
documents efficiently. So if you want to get a field's value in the
response, you should keep setting stored=true. There might be
optimizations in the future for example if you're only asking for a
single field which has doc values, but this will be transparent to
you.

--
Adrien

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message