lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <>
Subject Re: Polymorphic Index
Date Fri, 22 Oct 2010 13:26:31 GMT

On Oct 21, 2010, at 3:44 PM, eks dev wrote:

> Hi All, 
> I am trying to figure out a way to implement following use case with 
> lucene/solr. 
> In order to support simple incremental updates (master) I need to index  and 
> store UID Field on 300Mio collection. (My UID is a 32 byte  sequence). But I do 
> not need indexed (only stored) it during normal  searching (slaves). 
> The problem is that my term dictionary gets blown away with sheer number  of 
> unique IDs. Number of unique terms on this collection, excluding UID  is less 
> than 7Mio.
> I can tolerate resources hit on Updater (big hardware, on disk index...).
> This is a master slave setup, where searchers run from RAMDisk and  having 
> 300Mio * 32 (give or take prefix compression) plus pointers to  postings and 
> postings is something I would really love to avoid as this  is significant 
> compared to really small documents I have. 
> Cutting to the chase:
> How I can have Indexed UID field, and when done with indexing:
> 1) Load "searchable" index into ram from such an index on disk without one 
> field? 

That doesn't seem like it would be all that hard to do in Lucene with a few edits to the appropriate
low level classes to simply not load the term dictionary for a particular set of fields (pass
in a set?).  This sort of masking even seems like a generally useful performance gain in the
typical master/worker replicated environment.

> 2) create 2 Indices in sync on docIDs, One containing only indexed UID

Kind of reminds me of Andrzej's pruning codec stuff.  Perhaps the new Flex stuff helps here?

> 3) somehow transform index with indexed UID by droping UID field, preserving 
> docIs. Kind of tool smart index-editing tool. 

Again, take a look at Andrzej's pruning codec.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message