lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Britske <gbr...@gmail.com>
Subject Re: merging Parallel indexes (can indexWriter.addIndexesNoOptimize be used?)
Date Wed, 04 Nov 2009 11:46:13 GMT

Thanks, but it's already guaranteed that the indexes are in sync. So I could
(and do) use parallelReader to search them both at the sime time. This is
what my running index looks like. 

However at certain points I was considering to store a  frozen index from
the parallel index for backup/ other purposes. I figured having it merged
would shave of some complexity. 

Perhaps this should go to .dev channel but anyway:Before giving this a rest
would it be hard to write some low-level code to merge the two? I've never
touched the low-level classes and methods (inside documentWriter), but I'm
looking for a way to directly change the segment files. .frq, .tis, .tii in
particular, the rest can remain untouched for my setup if I'm correct. In
other words, I would like to bypass writer.addDocument(), because being able
to change to index-files directly would be far more efficient for my
situation I believe.

What classes, methods would I need to look into for changing / writing these
files directly? 

 Some background info why I think a merge could be done rather efficient in
this particular situation if I had access to these low level methods: 

- we have index A with stored and indexed fields, index B with only indexed
fields (with omitTF / omitnorms =true) 
- merge index B into index A. 
--> probably no need to change Fields (.fdx)  and Field Index (.fdt)
(because nothing stored and docids in order) 
--> no change to Positions (.prx) Normalization factors (.nrm) (because
omitTF / omitnorms =true))

- all fields in index B are prefixed with a particular sequence
--> since all fields in index B are prefixed with a particular sequence,
this means I could drop in terms sequentially in Term Infos(.tis) and Term
Info Index (.tii) (because of the lexiographical ordening of these files and
the prefixed fields inindex B) 
--> similarly, because Frequencies (.frq) depends on ordering of .tis I
could drop .freq of index B into the correct position of .frq of index A and
be done with it. (again because of the same prefix used on all fields of
index B) 

Would this work? And where to start looking? 

Thanks in advance, 
Geert-Jan




Michael McCandless-2 wrote:
> 
> addIndexesNoOptimize is only for shards.
> 
> But this [pending patch/contribution] is similar what you're seeking, I
> think:
> 
>   https://issues.apache.org/jira/browse/LUCENE-1879
> 
> It does not actually merge the indexes, but rather keeps 2 parallel
> indexes in sync so you can use ParallelReader to search them
> coherently.
> 
> Mike
> 
> On Tue, Nov 3, 2009 at 1:46 PM, Britske <gbrits@gmail.com> wrote:
>>
>> Given two parallel indexes which contain the same products but different
>> fields, one with slowly changing fields and one with fields which are
>> updated regularly:
>>
>> Is it possible to periodically merge these to form a single index?
>>  (thereby
>> representing a frozen snapshot in time)
>>
>> For example: Can indexWriter.addIndexesNoOptimize handle this, or was it
>> (only) designed for merging shards?
>> If not, is there another option (3rd party or not) to use, or would I
>> have
>> to resort to low-level hacking?
>>
>> Thanks,
>> Geert-Jan
>> --
>> View this message in context:
>> http://old.nabble.com/merging-Parallel-indexes-%28can-indexWriter.addIndexesNoOptimize-be-used-%29-tp26161322p26161322.html
>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 
> 

-- 
View this message in context: http://old.nabble.com/merging-Parallel-indexes-%28can-indexWriter.addIndexesNoOptimize-be-used-%29-tp26161322p26194788.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message