incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dean Landolt" <d...@deanlandolt.com>
Subject Re: the search api?
Date Mon, 28 Jul 2008 21:31:09 GMT
I updated http://wiki.apache.org/couchdb/FullTextIndexWithView with a
slightly more robust implementation. Still no boolean abilities though --
I'm coming the internets trying to figure out how google does it in m/r, but
my best guess is they just brute-force the merge (and probably track some
stats to guess a total). This doesn't seem like something that would lend
itself easily to couch -- but I could be wrong. I'm probably wrong. Please,
someone tell me I'm wrong...

Dean

On Mon, Jul 28, 2008 at 1:18 PM, Dean Landolt <dean@deanlandolt.com> wrote:

> Gladly. I'll get it on the wiki and send a link after I clean it up.
>
> Regarding merging views, something like that would be fantastic, though I
> can't really comprehend the performance implications. If a view can peer
> into another view for its processing, I gather this would mean it would have
> to be updated every time a change happens in the referenced view(s), and an
> incremental update here may really mean a full update of the view in
> question, but I'm just guessing. Though this would allow real *joins *and
> end that whole question once and for all... :)
>
>
>
> On Sun, Jul 27, 2008 at 7:04 PM, Dan Reverri <reverri@gmail.com> wrote:
>
>> Dean,
>>
>> Any chance you want to share your view code?
>>
>> In regards to the query parsing, I am not sure how this will work. Right
>> now
>> results for each term have to be pulled down to the client and merged
>> together. Perhaps we could add a query method to views that allow
>> different
>> key values to be combined.
>>
>> A user could query a view with a set of keys and a merge function that
>> could
>> define how the key values could be combined.
>>
>> On Fri, Jul 25, 2008 at 5:01 PM, Dean Landolt <dean@deanlandolt.com>
>> wrote:
>>
>> > On Mon, Jul 21, 2008 at 11:45 AM, Dean Landolt <dean@deanlandolt.com>
>> > wrote:
>> >
>> > > On Mon, Jul 21, 2008 at 1:08 AM, Dan Reverri <reverri@gmail.com>
>> wrote:
>> > >
>> > >> Is it worthwhile to implement a full text indexer on top of couchdbs
>> > >> map/reduce functionality?
>> > >>
>> > >> http://wiki.apache.org/couchdb/FullTextIndexWithView
>> > >>
>> > >
>> > >
>> > > Interesting idea. There's definitely more to FTI than tokenization
>> alone,
>> > > but then again there's an awful lot of power in m/r and javascript --
>> it
>> > > didn't take me a second to find a porter stemming algorithm in js:
>> > > http://tartarus.org/~martin/PorterStemmer/js.txt<http://tartarus.org/%7Emartin/PorterStemmer/js.txt>
>> <http://tartarus.org/%7Emartin/PorterStemmer/js.txt>
>> > <http://tartarus.org/%7Emartin/PorterStemmer/js.txt>
>> > >
>> > > I bet variable weighting would be pretty close to impossible in the
>> m/r
>> > > paradigm though, and probably some other features (of course, I could
>> be
>> > > wrong, and when it comes to couchdb, thus far I usually am). For a
>> > strait-up
>> > > word search, this is servicible as is. I'm going to see if I can't
>> figure
>> > > out how to shoehorn in some boolean features.
>> > >
>> >
>> > I gave this approach another look and I was able to get a view together
>> > that
>> > did a little more (stemming, optional case-insensitivity, min length for
>> > tokens, better whitespace handling). I'm working on an ngram view too
>> and
>> > so
>> > far it's promising. But there's still one huge problem -- for the life
>> of
>> > me
>> > I can't figure out a workable strategy for boolean operations that
>> doesn't
>> > involve fully loading each piece of the query. Am I missing something?
>> Is
>> > something like this even possible? I know there's no way to load a piece
>> of
>> > a view from another view -- but I just can't help but really wish there
>> > were.
>> >
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message