lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Olivier Binda <olivier.bi...@wanadoo.fr>
Subject Re: remapping docIds in a read only offline built index
Date Mon, 02 Jun 2014 07:22:19 GMT
Hello, I'm still interested in having the answer to the following question :

In a 1-segment read-only index (that is built offline once and then 
frozen), is it possible to remap the docIds ?



I may have a (working but not optimal) answer to my original problem : I 
may use a MultiReader and 3 index to get the following composite index

docId   : document
-------------------------
1             : bookA
2             : bookB
....

M            : linkA
M+1       : linkB
...
N+1       :  sentenceA
N+2       : sentenceB
...
300000 :sentenceZZZ


This solution should be slower that if I only built 1 index while having 
the docId equal to the order in which I added the documents.










On 05/12/2014 06:01 PM, Olivier Binda wrote:
> In a 1-segment (parallel) read-only index, that is built offline once 
> (and then frozen),
> is it possible to remap the docIds as the last step (i.e... to have 
> the exact same index, except that the docIds are all equal to the ord 
> the docs where added to the index) ?
>
> Say I have the read only index
>
> docId   : document
> 1 : bookB
> 2 : sentenceB
> 3 : linkA
> 4 : linkC
> 5 : sentenceC
> 6 : sentenceA
> 7 : bookA
> ...
> 300000 : linkD
>
> I would like to have instead the read-only index
>
> docId   : document
> 1 : bookA
> 2 : bookB
> ....
>
> M : linkA
> M+1: linkB
> ...
> N+1 : sentenceA
> N+2 : sentenceB
> ...
> 300000:sentenceZZZ
>
> This would allow me to reduce the amount of ram to cache the type of 
> each document
>
> -> without remapping, I need at least log2(types)* documents bits
> here 2 * 300000 bits
>
> -> with remapping, I need only to remember ints M and N
>
> Also, if I need to cache 1 byte of metadata for each book
>
> -> without remapping, I would need 1 byte * documents
> here 300000 bytes
>
> -> with remapping, I would only need 1 byte * books
> here M - 1 bytes
>
>
> I tried building such an index with 
> LogMergePolicy/NoMergePolicy/extending the ram buffer but (maybee I 
> did something wrong),
> the docIds were always reshuffled (maybee because my index was big and 
> I was over a threshold)
>
>
>
> Best regards,
> Olivier
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message