couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Anderson <jch...@apache.org>
Subject Re: Building IFI View for Text Queries
Date Wed, 06 Jan 2010 20:39:28 GMT
On Wed, Jan 6, 2010 at 11:10 AM, Nic Pottier <nicpottier@gmail.com> wrote:
> On Wed, Jan 6, 2010 at 10:48 AM, Chris Anderson <jchris@apache.org> wrote:
>> The only catch is that you'll end up with a large index file in the
>> long run. Lucene's indexes should be more compact on disk. Lucene also
>> has more stemming options and will generally be smarter than your
>> tokenizer.
>>
>> That said, if it works, it works.
>
> Thanks Chris.  I do have a decent amount of experience with Lucene as
> well, so I realize that is a great product, I just didn't want to add
> another dependency, especially considering that CouchDB is still
> changing quite a bit under the hood.
>
> Any way to get an insight as to how big the index is?  I can see how
> big my database is (78M with ~11k docs) but I'd be curious to know how
> big that view is stored in memory.

The view is stored on disk. Look in the CouchDB data directory
/usr/local/var/lib/couchdb for the view directory.

>
> One question I have is that it seems like it is rather inefficient to
> store each word/id pair individually.  Would there be any value to
> adding a reduce step that groups them so that the view would be
> word->[id array] instead?  I will admit the reduce() step is one I am
> still grabbling with a bit.
>

Our reduce is not key-bounded, so [id array] would end up being the
list of unique ids in the entire database for full-reduce.

The storage inefficiency you describe is likely what would force you
from a pure Couch to a Lucene FTI solution first, as your data begins
to scale.

Chris



-- 
Chris Anderson
http://jchrisa.net
http://couch.io

Mime
View raw message