lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Adrien Grand (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-3855) DocValues support
Date Tue, 06 Nov 2012 00:08:12 GMT

    [ https://issues.apache.org/jira/browse/SOLR-3855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13491065#comment-13491065
] 

Adrien Grand commented on SOLR-3855:
------------------------------------

bq. Are you sure only one thing makes sense? What if i need integers that are larger than
a short, but the range of values (max-min)
is actually small. Then a Packed impl could make more sense. So we should think about this...

I understand your point, I am myself a big supporter of packed ints and plan to use them probably
more often than fixed ints, but I still think that fixed_ints would be a good default (no
one would be surprised if the doc values of a field which is an int in their schema require
4 bytes per value).

But if Lucene was able to switch automatically from packed ints to fixed_ints if they have
less than x% overhead, this would be great!

bq. Well I don't think there should be so many types

If you want to sort on a String field, there are 6 available types. And I think it should
be easy for people getting started with Solr to do simple things such as sorting data without
having to understand the different trade-offs of these doc values types in order to choose
one. Otherwise the risk is that they keep using the field cache instead because they find
it more convenient.

(I hate this argument because some people will certainly have troubles with SORTED doc values
on a unique field of a very large index, but anyway it is still better than the field cache?)

bq. In my opinion instead of IndexWriter streaming docvalues to the codec directly, only to
have the codec buffer up in ram and use
Counter for accounting, IndexWriter should buffer and things like STRAIGHT/VAR would just
be optimizations...

+1

{quote} I'm still worried about this case: I don't like them treated as stored fields. Its
only going to be more seeks if people have disk-enabled dvs that we must fetch in addition
to the stored fields.
I havent looked at the relevant bits, but is it possible we could treat "*" as just meaning
the stored fields still? Basically, if you CHOOSE to
request them, you get them, but we don't do anything trappy.{quote}

If we allow for direct doc values, this makes sense to not load them by default, but I think
we should add documentation to the example schema.xml so that people know that it is wasteful
to store fields if doc values are enabled and in memory, and that they can be added very easily
to the response by adding the field name to the fl parameter.

In case the unique key has doc values and is not stored, maybe it still makes sense to fetch
it when fl=*?


                
> DocValues support
> -----------------
>
>                 Key: SOLR-3855
>                 URL: https://issues.apache.org/jira/browse/SOLR-3855
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Assignee: Adrien Grand
>            Priority: Minor
>             Fix For: 4.1, 5.0
>
>         Attachments: SOLR-3855.patch
>
>
> It would be nice if Solr supported DocValues:
>  - for ID fields (fewer disk seeks when running distributed search),
>  - for sorting/faceting/function queries (faster warmup time than fieldcache),
>  - better on-disk and in-memory efficiency (you can use packed impls).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message