couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Damien Katz <dam...@apache.org>
Subject Re: Multiple filters on a large data set
Date Fri, 26 Sep 2008 19:42:12 GMT
Oops, I meant to respond to Jaap van der Plas with this message. Sorry  
Paul.

On Sep 26, 2008, at 3:37 PM, Damien Katz wrote:

> Your requirements as stated would be well met by a something like  
> Lucene.
>
> However, another possible way to go about this is to permute the key  
> sets into key arrays and emit each. The number of keys would  
> normally be (N!)/2, where N is the number of fields you are  
> indexing. However, we can use views collation to do range lookups,  
> allows us to ignore the different array key suffixes. That would  
> reduce the number of key arrays emitted per document to 2^N. If each  
> document has 10 fields, then the number of permutations would be  
> 2^10 or 1024 keys emitted per doc.
>
> To build that index for 50000 documents would take an on-disk view  
> index of 50,000,000 rows. Building it will take a very long time and  
> it will take a lot of disk space. But once built, it should then  
> possible to do the categorized, drill down searches, that can show  
> you relevant sub-categories and their counts to further narrow down  
> search, and do so pretty efficiently. This is very much the kind of  
> stuff like Endeca does for online retailers.
>
> I don't know if CouchDB views are up to it yet, but it might be  
> worth experimenting.
>
> -Damien
>
>
> On Sep 26, 2008, at 2:11 PM, Paul Davis wrote:


Mime
View raw message