incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paul Davis" <paul.joseph.da...@gmail.com>
Subject Re: the search api?
Date Mon, 28 Jul 2008 22:26:07 GMT
The "Programming Erlang" book by Armstrong is good.

On Mon, Jul 28, 2008 at 6:17 PM, Dean Landolt <dean@deanlandolt.com> wrote:
> First I'd say I probably ought to learn Erlang. Anybody have any good
> tutorials/resources for a complete virgin? I really don't have the time for
> this, but what can you do -- it's inevitable I, guess -- why fight it?
>
>
> On Mon, Jul 28, 2008 at 6:05 PM, Paul Davis <paul.joseph.davis@gmail.com>wrote:
>
>> Not sure on feasibility, but what would you say to making an erlang
>> function that would parse and do the boolean logic on the index? I
>> mean its kinda hackish, but it seems like it could be done fairly
>> easily. Also, it could be the basis for work on merging indeces.
>>
>> Paul
>>
>> On Mon, Jul 28, 2008 at 5:31 PM, Dean Landolt <dean@deanlandolt.com>
>> wrote:
>> > I updated http://wiki.apache.org/couchdb/FullTextIndexWithView with a
>> > slightly more robust implementation. Still no boolean abilities though --
>> > I'm coming the internets trying to figure out how google does it in m/r,
>> but
>> > my best guess is they just brute-force the merge (and probably track some
>> > stats to guess a total). This doesn't seem like something that would lend
>> > itself easily to couch -- but I could be wrong. I'm probably wrong.
>> Please,
>> > someone tell me I'm wrong...
>> >
>> > Dean
>> >
>> > On Mon, Jul 28, 2008 at 1:18 PM, Dean Landolt <dean@deanlandolt.com>
>> wrote:
>> >
>> >> Gladly. I'll get it on the wiki and send a link after I clean it up.
>> >>
>> >> Regarding merging views, something like that would be fantastic, though
>> I
>> >> can't really comprehend the performance implications. If a view can peer
>> >> into another view for its processing, I gather this would mean it would
>> have
>> >> to be updated every time a change happens in the referenced view(s), and
>> an
>> >> incremental update here may really mean a full update of the view in
>> >> question, but I'm just guessing. Though this would allow real *joins
>> *and
>> >> end that whole question once and for all... :)
>> >>
>> >>
>> >>
>> >> On Sun, Jul 27, 2008 at 7:04 PM, Dan Reverri <reverri@gmail.com> wrote:
>> >>
>> >>> Dean,
>> >>>
>> >>> Any chance you want to share your view code?
>> >>>
>> >>> In regards to the query parsing, I am not sure how this will work.
>> Right
>> >>> now
>> >>> results for each term have to be pulled down to the client and merged
>> >>> together. Perhaps we could add a query method to views that allow
>> >>> different
>> >>> key values to be combined.
>> >>>
>> >>> A user could query a view with a set of keys and a merge function that
>> >>> could
>> >>> define how the key values could be combined.
>> >>>
>> >>> On Fri, Jul 25, 2008 at 5:01 PM, Dean Landolt <dean@deanlandolt.com>
>> >>> wrote:
>> >>>
>> >>> > On Mon, Jul 21, 2008 at 11:45 AM, Dean Landolt <dean@deanlandolt.com
>> >
>> >>> > wrote:
>> >>> >
>> >>> > > On Mon, Jul 21, 2008 at 1:08 AM, Dan Reverri <reverri@gmail.com>
>> >>> wrote:
>> >>> > >
>> >>> > >> Is it worthwhile to implement a full text indexer on top
of
>> couchdbs
>> >>> > >> map/reduce functionality?
>> >>> > >>
>> >>> > >> http://wiki.apache.org/couchdb/FullTextIndexWithView
>> >>> > >>
>> >>> > >
>> >>> > >
>> >>> > > Interesting idea. There's definitely more to FTI than tokenization
>> >>> alone,
>> >>> > > but then again there's an awful lot of power in m/r and javascript
>> --
>> >>> it
>> >>> > > didn't take me a second to find a porter stemming algorithm
in js:
>> >>> > > http://tartarus.org/~martin/PorterStemmer/js.txt<http://tartarus.org/%7Emartin/PorterStemmer/js.txt>
>> <http://tartarus.org/%7Emartin/PorterStemmer/js.txt>
>> >>> <http://tartarus.org/%7Emartin/PorterStemmer/js.txt>
>> >>> > <http://tartarus.org/%7Emartin/PorterStemmer/js.txt>
>> >>> > >
>> >>> > > I bet variable weighting would be pretty close to impossible
in the
>> >>> m/r
>> >>> > > paradigm though, and probably some other features (of course,
I
>> could
>> >>> be
>> >>> > > wrong, and when it comes to couchdb, thus far I usually am).
For a
>> >>> > strait-up
>> >>> > > word search, this is servicible as is. I'm going to see if
I can't
>> >>> figure
>> >>> > > out how to shoehorn in some boolean features.
>> >>> > >
>> >>> >
>> >>> > I gave this approach another look and I was able to get a view
>> together
>> >>> > that
>> >>> > did a little more (stemming, optional case-insensitivity, min length
>> for
>> >>> > tokens, better whitespace handling). I'm working on an ngram view
too
>> >>> and
>> >>> > so
>> >>> > far it's promising. But there's still one huge problem -- for the
>> life
>> >>> of
>> >>> > me
>> >>> > I can't figure out a workable strategy for boolean operations that
>> >>> doesn't
>> >>> > involve fully loading each piece of the query. Am I missing
>> something?
>> >>> Is
>> >>> > something like this even possible? I know there's no way to load
a
>> piece
>> >>> of
>> >>> > a view from another view -- but I just can't help but really wish
>> there
>> >>> > were.
>> >>> >
>> >>>
>> >>
>> >>
>> >
>>
>

Mime
View raw message