incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nic Pottier <nicpott...@gmail.com>
Subject Re: Building IFI View for Text Queries
Date Wed, 06 Jan 2010 19:10:01 GMT
On Wed, Jan 6, 2010 at 10:48 AM, Chris Anderson <jchris@apache.org> wrote:
> The only catch is that you'll end up with a large index file in the
> long run. Lucene's indexes should be more compact on disk. Lucene also
> has more stemming options and will generally be smarter than your
> tokenizer.
>
> That said, if it works, it works.

Thanks Chris.  I do have a decent amount of experience with Lucene as
well, so I realize that is a great product, I just didn't want to add
another dependency, especially considering that CouchDB is still
changing quite a bit under the hood.

Any way to get an insight as to how big the index is?  I can see how
big my database is (78M with ~11k docs) but I'd be curious to know how
big that view is stored in memory.

One question I have is that it seems like it is rather inefficient to
store each word/id pair individually.  Would there be any value to
adding a reduce step that groups them so that the view would be
word->[id array] instead?  I will admit the reduce() step is one I am
still grabbling with a bit.

-Nic

Mime
View raw message