lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: remapping docIds in a read only offline built index
Date Mon, 02 Jun 2014 07:26:21 GMT
The index sorting APIs (in lucene/misc) can do this.  E.g. you could
make a SortingAtomicReader, with your sort criteria, then use
addIndexes(IR[]) to add it to a new index.  That resulting index would
have 1 segment and the docIDs would be in your order.

Mike McCandless

http://blog.mikemccandless.com


On Mon, May 12, 2014 at 12:01 PM, Olivier Binda
<olivier.binda@wanadoo.fr> wrote:
> In a 1-segment (parallel) read-only index, that is built offline once (and
> then frozen),
> is it possible to remap the docIds as the last step (i.e... to have the
> exact same index, except that the docIds are all equal to the ord the docs
> where added to the index) ?
>
> Say I have the read only index
>
> docId   : document
> 1 : bookB
> 2 : sentenceB
> 3 : linkA
> 4 : linkC
> 5 : sentenceC
> 6 : sentenceA
> 7 : bookA
> ...
> 300000 : linkD
>
> I would like to have instead the read-only index
>
> docId   : document
> 1 : bookA
> 2 : bookB
> ....
>
> M : linkA
> M+1: linkB
> ...
> N+1 : sentenceA
> N+2 : sentenceB
> ...
> 300000:sentenceZZZ
>
> This would allow me to reduce the amount of ram to cache the type of each
> document
>
> -> without remapping, I need at least log2(types)* documents bits
> here 2 * 300000 bits
>
> -> with remapping, I need only to remember ints M and N
>
> Also, if I need to cache 1 byte of metadata for each book
>
> -> without remapping, I would need 1 byte * documents
> here 300000 bytes
>
> -> with remapping, I would only need 1 byte * books
> here M - 1 bytes
>
>
> I tried building such an index with LogMergePolicy/NoMergePolicy/extending
> the ram buffer but (maybee I did something wrong),
> the docIds were always reshuffled (maybee because my index was big and I was
> over a threshold)
>
>
>
> Best regards,
> Olivier
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message