lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From eks dev <eks...@yahoo.co.uk>
Subject Polymorphic Index
Date Thu, 21 Oct 2010 19:44:34 GMT
Hi All, 
I am trying to figure out a way to implement following use case with 
lucene/solr. 


In order to support simple incremental updates (master) I need to index  and 
store UID Field on 300Mio collection. (My UID is a 32 byte  sequence). But I do 
not need indexed (only stored) it during normal  searching (slaves). 


The problem is that my term dictionary gets blown away with sheer number  of 
unique IDs. Number of unique terms on this collection, excluding UID  is less 
than 7Mio.
 I can tolerate resources hit on Updater (big hardware, on disk index...).

This is a master slave setup, where searchers run from RAMDisk and  having 
300Mio * 32 (give or take prefix compression) plus pointers to  postings and 
postings is something I would really love to avoid as this  is significant 
compared to really small documents I have. 


Cutting to the chase:
How I can have Indexed UID field, and when done with indexing:
1) Load "searchable" index into ram from such an index on disk without one 
field? 

2) create 2 Indices in sync on docIDs, One containing only indexed UID
3) somehow transform index with indexed UID by droping UID field, preserving 
docIs. Kind of tool smart index-editing tool. 

Something else already there i do not know?

Preserving docIds is crucial, as I need support for lovely incremental  updates 
(like in solr master-slave update). Also Stored field should  remain!
I am not looking for "use MMAPed Index and let OS deal with it advice"... 
I do not mind doing it with flex branch 4.0, nut being in a hurry.

Thanks in advance, 
Eks 


      

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message