lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Olivier Binda <>
Subject Re: Fields, Index segments and docIds
Date Tue, 29 Apr 2014 07:30:59 GMT
This really help ! I didn't know about MultiReader. This looks like 
exactly what I need for 1 & 2

For 3. Remapping docIds would allow me to use them as ids for my data,
instead of having a stored field with my ids (which is usually the 
official recommanded way to do this is lucene)

It may not be a good idea, but as my index is read only...that might be 
(trade off between less speed/complexity ... and maintability) ?

Now, with MultiReader... maybee I don't need docIds remapping

Thanks for the great answer !

On 04/29/2014 08:46 AM, Uwe Schindler wrote:
> Hi Oliver,
> To me it looks like you want to do it much too complicated. It also seems that you misunderstood
join queries, which seems to be your problem. Comments inside:
>> My lucene Index is built and stored in a zip file (uncompressed) which is used
>> as a read-only Directory.
>> 1) At lucene indexing time, is it possible to rewrite the index so that some
>> fields are only found in some segments Say :
>> EnglishWords, EnglishVerbs go to Segment 1 GermanWords,
>> GermanSentences go to Segment 2 French, frenchWines go to Segment 3 ...
> You can create the 100% same index structure manually without dealing with Lucene internals.
Just index every language into a separate index with a separate IndexWriter. As those segments
are read-only, you can call forceMerge(1) after indexing, so those indexes have exactly 1
segment -> every language has one single segment.
> The only difference is: You would need a separate ZIP file for every language (which
is what you probably need, because you want to ship "language packs"). Or you have to rewrite
your ZIP-Directory implementation, to work on subdirectories inside the ZIP file.
>> 2) In what file is the index structure written (number of index,
>> docValues...) ? And, is it possible, to tamper in some way with this Say, in a
>> Directory start of my application, to tell the lucene index
>> to use this segment or not
> If every language is a separate index, just use "new MultiReader(indexReader1, indexReader2,
indexReader3)" to combine them and query the multiReader. This is the identical structure
to a single DirectoryReader (which is also handled as a MultiReader internally) and therefore
has no speed impact.
>> If 1, 2 were possible, I think that it would allow me to ship my index
>> in a modular way in my apps (with language packs)
>> and do join queries as regular queries, with no speed penalty
> The "join" keyword seems to be your main misunderstanding. There is no relation between
join queries and multiple indexes. In Lucene "join" queries are to join between documents
of different type in the same index! Queryng multiple indexes together is not joining, it
is simple and very fast (because this is how Lucene was made): Just use the MultiReader approach
from above to query all indexes at the same time. As a MultiReader with many 1-segments DirectoryReaders
is identical to a large DirectoryReader with n segments, there is no difference at all.
> This is something different:
>> 3) At lucene indexing time, is it possible to remap the docId values  (I saw
>> some MergeState.mapDocId method...) Say
>>    0 -> 4
>>    1 -> 3
>>    2 -> 1
>>    3 -> 0
>>    4 -> 2
>>> If 3 is possible, It would allow me to have some sort of
>> forward/backward compatibilities with my shipped language packs
>> and also to have fast implementations for some id related methods
> What do you want to do? Why do you want to do this? (please refer to XY-Problem: <>).
> Uwe
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message