lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <...@thetaphi.de>
Subject RE: Adding Docvalues to a Field
Date Fri, 05 May 2017 22:32:12 GMT
Hi Aravinth,

To get rid of the partially merged (mixed) docvalues fields you can use the following additional
approach on top of my previous mail:
 
> Erick was referring to Solr. To fix your issue without fully indexing you can
> use merging to update the whole index. To do this use the following
> approach:
> 
> Wrap your index using UninvertingReader. Then get all LeadReaders using
> the leaves() method.

The problem is by that approach, that all those leaves that have partial (!) docvalues are
seen by UninvertingReader as having DocValues already and those just return the partial DocValues,
so Uninverting is not done. So we have to trick UninvertingReader to ignore the already existing
(partial) DocValues. So instead of wrapping the whole IndexReader, we change the workflow:

- Get all leaves() of the broken docvalues/non-docvalues index
- Wrap all those LeafReader instances using an anonymous FilterLeafReader instance, overriding
all the DocValues-related methods to return "null" instead of calling super. This hides all
partially existing doc values (not form FieldInfos, but that should not hurt). The consumer
of this reader will see no DocValues. 
- Then wrap those filtered Readers with new UninvertingRaeder(filteredLeaf) - this adds back
fresh DocValues, recalculated from the uninverted fields. Be sure to get the types right,
otherwise you will get merge errors (incompatible field types).
- Then wrap all those uninverting leaves with SlowCodecReaderWrapper.wrap(). This makes them
mergeable (its slow and costs memory, but works).

The remaining stuff as said before:
 
> Then create an new index with IndexWriter and use
> IndexWriter.addIndex(CodecReader) and pass in the previously created
> wrappers, ideally one by one. Those readers are slow, but ready to be
> merged into a new index with DocValues. The empty Writer will then import
> the wrapped index and takes the emulates DocValues. This may take some
> time, but afterwards you have an index with all fields having the DocValues
> on disk. Inverting is no longer needed.
> 
> I hope that helps. I can post code that should do this. There is no ready to
> use tool available, because you need to correctly configure the uninverter.
> 
> Uwe
> 
> Am 5. Mai 2017 22:12:13 MESZ schrieb aravinth thangasami
> <aravinththangasami@gmail.com>:
> >Thanks Erick
> >
> >On Fri, May 5, 2017 at 9:19 PM, Erick Erickson
> ><erickerickson@gmail.com>
> >wrote:
> >
> >> In a word, "no". You must re-index from scratch. Worse, now that you
> >> have some segments thinking the fields are docValues and some not and
> >> maybe some mixed, I know of no way to un-entangle them.
> >>
> >> I'd create a new collection and re-index it entirely, then use
> >> collection aliasing to point the applications at the new collection.
> >>
> >> Best,
> >> Erick
> >>
> >> On Fri, May 5, 2017 at 2:49 AM, aravinth thangasami
> >> <aravinththangasami@gmail.com> wrote:
> >> > Hi all,
> >> >
> >> > On process of moving to Lucene 5 from Lucene 4, we faced this
> >following
> >> > issue
> >> > We have enabled doc values in Lucene 5.we previously don't used doc
> >> values
> >> > in Lucene 4
> >> >
> >> > Using UninvertingReader, sorting works fine until the first merge
> >> happens.
> >> > On merge documents in the older version without doc values affect
> >the
> >> > sorting order.
> >> >
> >> > Is there any way to solve this issue without reindexing ???
> >> >
> >> > What is  your opinion on it ?
> >> >
> >> > I was thinking about these two ways.will these possible ?
> >> >
> >> > 1. Does Uninverting Reader can be made to store the formed doc
> >values to
> >> > disk ?
> >> > 2. During merge, does IndexWriter can be made to write the doc
> >values for
> >> > documents without doc value ?
> >> >
> >> >
> >> >
> >> > Thanks
> >> > Aravinth
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>
> >>
> 
> --
> Uwe Schindler
> Achterdiek 19, 28357 Bremen
> https://www.thetaphi.de


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message