lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ronald Wood <rw...@smarsh.com>
Subject Re: Is it safe to upgrade an existing field to docvalues?
Date Wed, 24 Aug 2016 19:08:02 GMT
OK. Thank you, Alessandro, for clarifying this matter.

The reason I wasn’t sure about this is that this is somewhat ambiguous in the documentation.
In the 6.1 Guide I see: “If you have already indexed data into your Solr index, you will
need to completely re-index your content after changing your field definitions in schema.xml
in order to successfully use docValues.”  Maybe that should read “...in order to successfully
sort or facet on that field or use any other features that depend on docValues. Partially
converted indexes will result in exceptions because of inconsistent data for docValues. ”

Moreover, as I mentioned in my first post, I saw some indication that Solr will fall back
to using the UninvertingReader if it doesn’t find docValues as expected.

In my testing, I did see that /export was definitely an all or nothing case: all data had
to be docValues before I could get data. /select mostly works – except when it occasionally
doesn’t.

-------------------

*I wonder if I can make a proposal*: would it be possible to add a property to the schema
called useDocValues=true/false, defaulting to true?

The idea would be that if docValues=true, indexing docValues would be as before, but Solr
would not use them as long as useDocValues=false.

Once anyone using this is sure that docValues are fully indexed, set useDocValues=true (or
remove), and Solr would behave as now.

I spent a little time going down into the code and at first glance this seems feasible. I
would be willing to log the ticket and perhaps provide a patch.

Does this sound feasible to anyone else? I am uncertain if this requires any changes at the
Lucene level, but looking at Solr core code all the switching is done in Solr on field.hasDocValues.
The code would be amended to (field.hasDocValues && field.useDocValues) throughout.

I would have to imagine this would be helpful to others out there with large amounts of data
to migrate.

- Ronald S. Wood 


On 8/24/16, 10:14, "Alessandro Benedetti" <abenedetti@apache.org> wrote:

    I am sorry Ronald but :
    "  ask because my presupposition has been that we could turn it on without
    any harm as we incrementally converted our indexes."
    
    This is not possible, if you change the schema and then slowly update the
    documents you are introducing inconsistency that will reflect in sorting
    and faceting.
    Because solr will check the field attributes, will see docValues, but then
    will find only partial docValues.
    So the docValue for some documents will be null.
    
    You need to go live one-shot.
    This is the reason Shawn and Toke suggest a parallel index, with the
    docValues enabled and finally you swap.
    
    Cheers
    
    On Wed, Aug 24, 2016 at 2:56 PM, Shawn Heisey <apache@elyograg.org> wrote:
    
    > On 8/23/2016 2:01 PM, Ronald Wood wrote:
    > > In general, is there a way to migrate existing indexes (we have
    > petabytes of data) by enabling docvalues and incrementally re-indexing? We
    > expect the latter would take a month using an atomic update process.
    >
    > One way to handle it is to build a new index with an updated
    > configuration, then switch to the new index.  Since you're not running
    > SolrCloud, you can switch by swapping the cores.  If you were running
    > SolrCloud, you'd need to alias the old name to the new collection, which
    > might involve deleting the old collection first.  Swapping cores in
    > cloud mode will break things.
    >
    > The other replies you've gotten are interesting.  The approach using
    > Atomic Updates will only work if your index meets the requirements for
    > Atomic Updates.
    >
    > https://wiki.apache.org/solr/Atomic_Updates#Caveats_and_Limitations
    >
    > You've already said it would take a month using atomic update ... which
    > might mean you've already thought about whether or not your index meets
    > the requirements.
    >
    > Toke's tool looks quite interesting, and would probably do the job a lot
    > faster than any other method.
    >
    > Thanks,
    > Shawn
    >
    >
    
    
    -- 
    --------------------------
    
    Benedetti Alessandro
    Visiting card : http://about.me/alessandro_benedetti
    
    "Tyger, tyger burning bright
    In the forests of the night,
    What immortal hand or eye
    Could frame thy fearful symmetry?"
    
    William Blake - Songs of Experience -1794 England
    


Mime
View raw message