lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Adrien Grand (JIRA)" <>
Subject [jira] [Commented] (SOLR-3855) DocValues support
Date Wed, 31 Oct 2012 08:47:13 GMT


Adrien Grand commented on SOLR-3855:

bq. We could combine these? e.g. a docValueType of "none" or something? This would parallel
the lucene apis and maybe make things a bit simpler.

Good point.

Additionally I currently force doc values to be non-direct (ie. in-memory). Do you think it
is fine or should we give people the choice? I wasn't sure when writing the patch because
I think they would provide irregular performance depending on the good will of the I/O cache
(I was thinking of people benchmarking with a read-only index, then going into production
and performing a sort on a large result set while a background merge is running (eating all
the I/O cache memory) and BOOM!). But maybe I'm too pessimistic. :-)

bq. it would be really great if fieldcache and docvalues had the same API

Yes it would make things so much easier... I also wish DocValues.Source and FunctionValues
were the same class.

bq. Would be awesome if faceting etc could use docvalues: though I think there is likely some
work for the multivalued case?

Right, DocValues faceting has its own challenges. :-) But that's clearly an issue where merging
fieldcache, DocValues.Source and FunctionValues would make things easier : we would have only
one code base that is independant from the source of "values" and SOLR-1581 would almost come

bq. I didn't look at this part, but is this really true? its numFields * rows right?

I was thinking of non-direct doc values for ID fields. Correct me if I'm wrong but when doing
a distributed search:

 1. createMainQuery: Solr first asks every shard for the IDs of the best (start + rows) docs
 2. createRetrieveDocs: Solr selects the {{rows}} IDs of documents to display and asks the
shards  they are stored on for their stored fields

So step 1 requires {{(start + rows)}} seeks in the FDT file per shard (to know their IDs)
and step 2 requires {{rows}} seeks overall. So the total is {{(numShards * (start + rows))
+ rows}}. If we stored document IDs in memory I think this could help reduce this number to
{{rows}} (only the second step), which would be great, especially for deep paging or large
number of shards.

bq. But in general if docvalues are presented like stored fields for general purposes I think
thats not a great illusion to give to the user in case they have a lot of fields?

Of course it makes no sense to store all fields in DocValues, I think they are best used for
ID fields, sorting, scoring factors (function queries) and (soon :)) faceting. I wanted them
to behave like stored fields so that users don't make their fields stored in addition to DocValues
for convenience (this is a waste of space, and the bigger the FDT file is, the more likely
the I/O cache can't serve disk seeks in this file).
> DocValues support
> -----------------
>                 Key: SOLR-3855
>                 URL:
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Assignee: Adrien Grand
>            Priority: Minor
>             Fix For: 4.1, 5.0
>         Attachments: SOLR-3855.patch
> It would be nice if Solr supported DocValues:
>  - for ID fields (fewer disk seeks when running distributed search),
>  - for sorting/faceting/function queries (faster warmup time than fieldcache),
>  - better on-disk and in-memory efficiency (you can use packed impls).

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message