couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paul Davis" <paul.joseph.da...@gmail.com>
Subject Re: the search api?
Date Mon, 28 Jul 2008 22:05:42 GMT
Not sure on feasibility, but what would you say to making an erlang
function that would parse and do the boolean logic on the index? I
mean its kinda hackish, but it seems like it could be done fairly
easily. Also, it could be the basis for work on merging indeces.

Paul

On Mon, Jul 28, 2008 at 5:31 PM, Dean Landolt <dean@deanlandolt.com> wrote:
> I updated http://wiki.apache.org/couchdb/FullTextIndexWithView with a
> slightly more robust implementation. Still no boolean abilities though --
> I'm coming the internets trying to figure out how google does it in m/r, but
> my best guess is they just brute-force the merge (and probably track some
> stats to guess a total). This doesn't seem like something that would lend
> itself easily to couch -- but I could be wrong. I'm probably wrong. Please,
> someone tell me I'm wrong...
>
> Dean
>
> On Mon, Jul 28, 2008 at 1:18 PM, Dean Landolt <dean@deanlandolt.com> wrote:
>
>> Gladly. I'll get it on the wiki and send a link after I clean it up.
>>
>> Regarding merging views, something like that would be fantastic, though I
>> can't really comprehend the performance implications. If a view can peer
>> into another view for its processing, I gather this would mean it would have
>> to be updated every time a change happens in the referenced view(s), and an
>> incremental update here may really mean a full update of the view in
>> question, but I'm just guessing. Though this would allow real *joins *and
>> end that whole question once and for all... :)
>>
>>
>>
>> On Sun, Jul 27, 2008 at 7:04 PM, Dan Reverri <reverri@gmail.com> wrote:
>>
>>> Dean,
>>>
>>> Any chance you want to share your view code?
>>>
>>> In regards to the query parsing, I am not sure how this will work. Right
>>> now
>>> results for each term have to be pulled down to the client and merged
>>> together. Perhaps we could add a query method to views that allow
>>> different
>>> key values to be combined.
>>>
>>> A user could query a view with a set of keys and a merge function that
>>> could
>>> define how the key values could be combined.
>>>
>>> On Fri, Jul 25, 2008 at 5:01 PM, Dean Landolt <dean@deanlandolt.com>
>>> wrote:
>>>
>>> > On Mon, Jul 21, 2008 at 11:45 AM, Dean Landolt <dean@deanlandolt.com>
>>> > wrote:
>>> >
>>> > > On Mon, Jul 21, 2008 at 1:08 AM, Dan Reverri <reverri@gmail.com>
>>> wrote:
>>> > >
>>> > >> Is it worthwhile to implement a full text indexer on top of couchdbs
>>> > >> map/reduce functionality?
>>> > >>
>>> > >> http://wiki.apache.org/couchdb/FullTextIndexWithView
>>> > >>
>>> > >
>>> > >
>>> > > Interesting idea. There's definitely more to FTI than tokenization
>>> alone,
>>> > > but then again there's an awful lot of power in m/r and javascript
--
>>> it
>>> > > didn't take me a second to find a porter stemming algorithm in js:
>>> > > http://tartarus.org/~martin/PorterStemmer/js.txt<http://tartarus.org/%7Emartin/PorterStemmer/js.txt>
>>> <http://tartarus.org/%7Emartin/PorterStemmer/js.txt>
>>> > <http://tartarus.org/%7Emartin/PorterStemmer/js.txt>
>>> > >
>>> > > I bet variable weighting would be pretty close to impossible in the
>>> m/r
>>> > > paradigm though, and probably some other features (of course, I could
>>> be
>>> > > wrong, and when it comes to couchdb, thus far I usually am). For a
>>> > strait-up
>>> > > word search, this is servicible as is. I'm going to see if I can't
>>> figure
>>> > > out how to shoehorn in some boolean features.
>>> > >
>>> >
>>> > I gave this approach another look and I was able to get a view together
>>> > that
>>> > did a little more (stemming, optional case-insensitivity, min length for
>>> > tokens, better whitespace handling). I'm working on an ngram view too
>>> and
>>> > so
>>> > far it's promising. But there's still one huge problem -- for the life
>>> of
>>> > me
>>> > I can't figure out a workable strategy for boolean operations that
>>> doesn't
>>> > involve fully loading each piece of the query. Am I missing something?
>>> Is
>>> > something like this even possible? I know there's no way to load a piece
>>> of
>>> > a view from another view -- but I just can't help but really wish there
>>> > were.
>>> >
>>>
>>
>>
>

Mime
View raw message