lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From eks dev <eks...@yahoo.co.uk>
Subject Re: Polymorphic Index
Date Fri, 22 Oct 2010 14:21:03 GMT
Thanks Grant, 
this sound good.

https://issues.apache.org/jira/browse/LUCENE-1812
and 
https://issues.apache.org/jira/browse/LUCENE-2632

I didn't notice them before due to high_volume high_quality traffic here in 
lucene world, one cannot keep up :)  


Will have to look into it in detail. 

With pruning the problem is going to be to somehow preserve this "write once" 
benefit for slave updates (copy deltas and relaod()) .
Update full index by adding/deleting a few docs -> commit ->  prune-> Update 
slaves incrementally? Will that work? 


I will have to check what this pruning codec produces (one merge on the way and 
I need full update of slaves...)

and these TeeSinkCodec and FilteringCodec look from JIRA description just exctly 
like a  solution! Sounds too good.


Thanks again!
Eks




----- Original Message ----
> From: Grant Ingersoll <gsingers@apache.org>
> To: dev@lucene.apache.org
> Sent: Fri, 22 October, 2010 15:26:31
> Subject: Re: Polymorphic Index
> 
> 
> On Oct 21, 2010, at 3:44 PM, eks dev wrote:
> 
> > Hi All, 
> > I  am trying to figure out a way to implement following use case with 
> > lucene/solr. 
> > 
> > 
> > In order to support simple incremental  updates (master) I need to index  and 
>
> > store UID Field on 300Mio  collection. (My UID is a 32 byte  sequence). But I 
>do 
>
> > not need  indexed (only stored) it during normal  searching (slaves). 
> > 
> > 
> > The problem is that my term dictionary gets blown away with  sheer number  of 
>
> > unique IDs. Number of unique terms on this  collection, excluding UID  is 
>less 
>
> > than 7Mio.
> > I can  tolerate resources hit on Updater (big hardware, on disk index...).
> > 
> > This is a master slave setup, where searchers run from RAMDisk  and  having 
> > 300Mio * 32 (give or take prefix compression) plus  pointers to  postings and 
>
> > postings is something I would really  love to avoid as this  is significant 
> > compared to really small  documents I have. 
> > 
> > 
> > Cutting to the chase:
> > How I  can have Indexed UID field, and when done with indexing:
> > 1) Load  "searchable" index into ram from such an index on disk without one 
> >  field? 
> 
> That doesn't seem like it would be all that hard to do in Lucene  with a few 
>edits to the appropriate low level classes to simply not load the  term 
>dictionary for a particular set of fields (pass in a set?).  This sort  of 
>masking even seems like a generally useful performance gain in the typical  
>master/worker replicated environment.
> 
> > 
> > 2) create 2 Indices  in sync on docIDs, One containing only indexed UID
> 
> Kind of reminds me of Andrzej's pruning codec stuff.  Perhaps the new Flex 
>stuff helps  here?
> 
> > 3) somehow transform index with indexed UID by dropingUID  field, preserving 

> > docIs. Kind of tool smart index-editing tool. 
> 
> Again, take a look at Andrzej's pruning codec.
> 
> -Grant
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For  additional commands, e-mail: dev-help@lucene.apache.org
> 
> 


      

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message