lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Muir (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-3855) DocValues support
Date Wed, 31 Oct 2012 05:49:12 GMT

    [ https://issues.apache.org/jira/browse/SOLR-3855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13487562#comment-13487562
] 

Robert Muir commented on SOLR-3855:
-----------------------------------

warning: just skimmed the patch.

{quote}
    configured on a per-field-type basis (docValueType=...),
    enabled on a per-field basis (docValues=true/false)
{quote}

We could combine these? e.g. a docValueType of "none" or something? This would parallel the
lucene apis and maybe make things a bit simpler.

{quote}
When doc values are enabled, they have precedence over the field cache for getValueSource
and getSortField, however faceting and stats cannot use doc values yet (I would like to do
this as a separate issue).
{quote}

Ultimately it would be really great if fieldcache and docvalues had the same API. I worry
about the fact that its not this way currently. This shouldn't block this patch, its just
a semi-related discussion... seems like fieldcache should be presented as "build docvalues
on the fly for the field".

Would be awesome if faceting etc could use docvalues: though I think there is likely some
work for the multivalued case? e.g. we would have to encode multiple tokens at a level above
into the single-valued StraightBytes or whatever ala DocTermOrds? or maybe we should think
about an actual type for this that can allow for more efficient impls?

{quote}
I also modified a lot of code (ReturnFields especially) to make DocValues behave like stored
fields. I think this would be great for ID fields. In a cluster that has numShards shards,
it would help decrease the number of disk seeks in the .fdt file (which is often too big to
fit entirely in the OS cache) per request from (numShards * (start + rows) + rows) to rows.
{quote}

I didn't look at this part, but is this really true? its numFields * rows right? If its some
special case for ID fields where #idfields=1 for distributed search or whatever, I think thats
a good optimization for that use-case. But in general if docvalues are presented like stored
fields for general purposes I think thats not a great illusion to give to the user in case
they have a lot of fields?

Thanks for getting this started!
                
> DocValues support
> -----------------
>
>                 Key: SOLR-3855
>                 URL: https://issues.apache.org/jira/browse/SOLR-3855
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Assignee: Adrien Grand
>            Priority: Minor
>             Fix For: 4.1, 5.0
>
>         Attachments: SOLR-3855.patch
>
>
> It would be nice if Solr supported DocValues:
>  - for ID fields (fewer disk seeks when running distributed search),
>  - for sorting/faceting/function queries (faster warmup time than fieldcache),
>  - better on-disk and in-memory efficiency (you can use packed impls).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message