incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jan Lehnardt <...@apache.org>
Subject Re: the search api?
Date Mon, 28 Jul 2008 23:54:30 GMT
There is also "Thinking in Erlang":
http://www.google.com/search?q=thinking%20in%20erlang

Cheers
Jan
--
On Jul 29, 2008, at 00:17, Dean Landolt wrote:

> First I'd say I probably ought to learn Erlang. Anybody have any good
> tutorials/resources for a complete virgin? I really don't have the  
> time for
> this, but what can you do -- it's inevitable I, guess -- why fight it?
>
>
> On Mon, Jul 28, 2008 at 6:05 PM, Paul Davis <paul.joseph.davis@gmail.com 
> >wrote:
>
>> Not sure on feasibility, but what would you say to making an erlang
>> function that would parse and do the boolean logic on the index? I
>> mean its kinda hackish, but it seems like it could be done fairly
>> easily. Also, it could be the basis for work on merging indeces.
>>
>> Paul
>>
>> On Mon, Jul 28, 2008 at 5:31 PM, Dean Landolt <dean@deanlandolt.com>
>> wrote:
>>> I updated http://wiki.apache.org/couchdb/FullTextIndexWithView  
>>> with a
>>> slightly more robust implementation. Still no boolean abilities  
>>> though --
>>> I'm coming the internets trying to figure out how google does it  
>>> in m/r,
>> but
>>> my best guess is they just brute-force the merge (and probably  
>>> track some
>>> stats to guess a total). This doesn't seem like something that  
>>> would lend
>>> itself easily to couch -- but I could be wrong. I'm probably wrong.
>> Please,
>>> someone tell me I'm wrong...
>>>
>>> Dean
>>>
>>> On Mon, Jul 28, 2008 at 1:18 PM, Dean Landolt <dean@deanlandolt.com>
>> wrote:
>>>
>>>> Gladly. I'll get it on the wiki and send a link after I clean it  
>>>> up.
>>>>
>>>> Regarding merging views, something like that would be fantastic,  
>>>> though
>> I
>>>> can't really comprehend the performance implications. If a view  
>>>> can peer
>>>> into another view for its processing, I gather this would mean it  
>>>> would
>> have
>>>> to be updated every time a change happens in the referenced  
>>>> view(s), and
>> an
>>>> incremental update here may really mean a full update of the view  
>>>> in
>>>> question, but I'm just guessing. Though this would allow real  
>>>> *joins
>> *and
>>>> end that whole question once and for all... :)
>>>>
>>>>
>>>>
>>>> On Sun, Jul 27, 2008 at 7:04 PM, Dan Reverri <reverri@gmail.com>  
>>>> wrote:
>>>>
>>>>> Dean,
>>>>>
>>>>> Any chance you want to share your view code?
>>>>>
>>>>> In regards to the query parsing, I am not sure how this will work.
>> Right
>>>>> now
>>>>> results for each term have to be pulled down to the client and  
>>>>> merged
>>>>> together. Perhaps we could add a query method to views that allow
>>>>> different
>>>>> key values to be combined.
>>>>>
>>>>> A user could query a view with a set of keys and a merge  
>>>>> function that
>>>>> could
>>>>> define how the key values could be combined.
>>>>>
>>>>> On Fri, Jul 25, 2008 at 5:01 PM, Dean Landolt <dean@deanlandolt.com

>>>>> >
>>>>> wrote:
>>>>>
>>>>>> On Mon, Jul 21, 2008 at 11:45 AM, Dean Landolt <dean@deanlandolt.com
>>>
>>>>>> wrote:
>>>>>>
>>>>>>> On Mon, Jul 21, 2008 at 1:08 AM, Dan Reverri <reverri@gmail.com>
>>>>> wrote:
>>>>>>>
>>>>>>>> Is it worthwhile to implement a full text indexer on top
of
>> couchdbs
>>>>>>>> map/reduce functionality?
>>>>>>>>
>>>>>>>> http://wiki.apache.org/couchdb/FullTextIndexWithView
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Interesting idea. There's definitely more to FTI than  
>>>>>>> tokenization
>>>>> alone,
>>>>>>> but then again there's an awful lot of power in m/r and  
>>>>>>> javascript
>> --
>>>>> it
>>>>>>> didn't take me a second to find a porter stemming algorithm in
 
>>>>>>> js:
>>>>>>> http://tartarus.org/~martin/PorterStemmer/js.txt<http://tartarus.org/%7Emartin/PorterStemmer/js.txt

>>>>>>> >
>> <http://tartarus.org/%7Emartin/PorterStemmer/js.txt>
>>>>> <http://tartarus.org/%7Emartin/PorterStemmer/js.txt>
>>>>>> <http://tartarus.org/%7Emartin/PorterStemmer/js.txt>
>>>>>>>
>>>>>>> I bet variable weighting would be pretty close to impossible
 
>>>>>>> in the
>>>>> m/r
>>>>>>> paradigm though, and probably some other features (of course,
I
>> could
>>>>> be
>>>>>>> wrong, and when it comes to couchdb, thus far I usually am).
 
>>>>>>> For a
>>>>>> strait-up
>>>>>>> word search, this is servicible as is. I'm going to see if I
 
>>>>>>> can't
>>>>> figure
>>>>>>> out how to shoehorn in some boolean features.
>>>>>>>
>>>>>>
>>>>>> I gave this approach another look and I was able to get a view
>> together
>>>>>> that
>>>>>> did a little more (stemming, optional case-insensitivity, min  
>>>>>> length
>> for
>>>>>> tokens, better whitespace handling). I'm working on an ngram  
>>>>>> view too
>>>>> and
>>>>>> so
>>>>>> far it's promising. But there's still one huge problem -- for the
>> life
>>>>> of
>>>>>> me
>>>>>> I can't figure out a workable strategy for boolean operations  
>>>>>> that
>>>>> doesn't
>>>>>> involve fully loading each piece of the query. Am I missing
>> something?
>>>>> Is
>>>>>> something like this even possible? I know there's no way to  
>>>>>> load a
>> piece
>>>>> of
>>>>>> a view from another view -- but I just can't help but really wish
>> there
>>>>>> were.
>>>>>>
>>>>>
>>>>
>>>>
>>>
>>


Mime
View raw message