couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Dionne <dio...@dionne-associates.com>
Subject Re: Full text search - is it coming? If yes, approx when.
Date Sat, 02 Apr 2011 12:15:17 GMT
Yes, I agree wholeheartedly with this view. I would go further and speculate (as I haven't
kept up with Lucene features) that Lucene
is still primarily focused on lexical techniques and any real NLP functionality is done in
Lucene plugins.

The core idea in FTI is easy, an inverted index. Quick and dirty search[1] is precisely the
use case I had in mind. I wanted something like the "jumpTo"
functionality in Futon that would work for more than just _ids and I wanted the capability
to filter the results by various fields on the documents and
have it index incrementally as updates occur. This isn't that difficult and couchdb isn't
so hard to build plugins for, though one does need to grok some
of the internals. It's also kind of surprising that a database admin GUI doesn't have this
simple capability.

Doing this in a way that scales to large dbs, is distributed, has all the wonderful features
Lucene has acquired and so forth is a different matter.

Anyway I did this as much as an erlang learning exercise and planning to revisit it, now that
I've picked up some skills,

Cheers,

Bob

[1] https://github.com/bdionne/bitstore/tree/master/src/search


On Mar 28, 2011, at 11:17 AM, Olafur Arason wrote:

> I love the power of Lucene but it's not needed for many usecases
> and can even be gutted like Cloudant is doing with their search
> using the lexer from Lucene.
> 
> But most of the time people need quick and dirty search and
> even search integration with views. Then you would maybe have
> a really simple lexer. And have it built in. If people need more
> power they would use Lucene.
> 
> It's like using a Ferrari to go to the store, it's cool but an overkill.
> 
> Hope you keep up the good work, couchdb-lucene is really easy
> to use.
> 
> Regards,
> Olafur Arason
> 
> Ps I was talking to an NLP expert and I realize that there is so
> much to searching. Especially doing it right that I think nobody
> will be able to re-implement Lucene anytime soon.
> 
> On Mon, Mar 28, 2011 at 14:30, Robert Newson <robert.newson@gmail.com> wrote:
>> I am a CouchDB committer and author of couchdb-lucene. :)
>> 
>> B.
>> 
>> On 28 March 2011 10:44, Andrew Stuart (SuperCoders)
>> <andrew.stuart@supercoders.com.au> wrote:
>>> Hi Robert
>>> 
>>> "there are no publicly known plans to build a native full-text indexing
>>> feature for CouchDB."
>>> 
>>> I don't know who is who around here as yet - are you commenting from inside
>>> knowledge or as an end user/developer?
>>> 
>>> Thanks
>>> 
>>> 
>>> On 28/03/2011, at 8:24 PM, Robert Newson wrote:
>>> 
>>> I have to dispute "There does not seem to be much understanding that
>>> this could be a killer feature."
>>> 
>>> Obviously full-text search is a killer feature, but it's trivially
>>> available now via couchdb-lucene or elasticsearch.
>>> 
>>> What people are asking for is native full-text search which, to me, is
>>> essentially asking for an Erlang port of Lucene. We'd love this, but
>>> it's a huge amount of work. Continually asking others to do
>>> significant amounts of work is also wearying.
>>> 
>>> To replace a Lucene-based solution and match its quality and breadth
>>> is a huge chunk of work and is only necessary to satisfy people who,
>>> for various reasons, don't want to use Java.
>>> 
>>> To answer the original post, there are no publicly known plans to
>>> build a native full-text indexing feature for CouchDB.
>>> 
>>> B.
>>> 
>>> On 28 March 2011 10:15, Olafur Arason <olafura@olafura.com> wrote:
>>>> 
>>>> There does not seem to be much understanding that this could be a killer
>>>> feature. People are now relying on Lucene which monitors the _changes
>>>> feed.
>>>> 
>>>> Cloudant has done it's own implementation which I gather through the
>>>> information they have published makes a view out of all your word,
>>>> they recommend java view because you can then reuse the lexer from
>>>> Lucene. Then I think they are reusing the reader of the view to make
>>>> their query. They have a similar syntax as Lucene for the query interface.
>>>> They are still working on this and I think they don't have that much
>>>> incentive to opensource it right away. But they have in past both
>>>> opensourced there technology like BigCouch so I think it's more a
>>>> matter of when rather then if.
>>>> 
>>>> I think this is a good solution for a fulltext search. But I don't think
>>>> that
>>>> the java view does not have direct access to the data so it could be
>>>> slow. But cloudant does clustering on view generation so that helps.
>>>> 
>>>> But there is also general problem with the current view system where
>>>> search technology could be used.
>>>> 
>>>> The view are really good at sorting but people are using them to
>>>> do key matches which they are not designed for. They beginkey and
>>>> endkey are for sorting ranges and are not good for matching which
>>>> most resources online are pointing to.
>>>> 
>>>> For example when you do:
>>>> beginkey = ["key11", "key21"]
>>>> endkey = ["key19", "key21"]
>>>> 
>>>> You get ["key11","key22"], ["key11", "key23"] ... ["key12","key21"],
>>>> ["key12","key22"]...
>>>> which makes sense when looking up sorting ranges but not using it to
>>>> match keys. But you can have a range match lookup but only on the
>>>> last key and never on two keys. So this would work:
>>>> 
>>>> beginkey = ["key21", "key11"]
>>>> endkey = ["key21", "key19"]
>>>> 
>>>> The current view interface could be augmented to accept queries
>>>> and could make them much more powerful then they currently are
>>>> and just using the keys for sorting and selecting which values you
>>>> want shown which they are designed to do and do really well.
>>>> 
>>>> This would be a killer feature and could use the new infrastructure
>>>> from Cloudant search.
>>>> 
>>>> And don't tell me the Elastic or Lucene interface could do anything
>>>> close to this :)
>>>> 
>>>> Regards,
>>>> Olafur Arason
>>>> 
>>>> On Mon, Mar 28, 2011 at 04:31, Andrew Stuart (SuperCoders)
>>>> <andrew.stuart@supercoders.com.au> wrote:
>>>>> 
>>>>> It would be good to know if full text search is coming as a core feature
>>>>> and
>>>>> if yes, approximately when - does anyone know?
>>>>> 
>>>>> Even an approximate timeframe would be good.
>>>>> 
>>>>> thanks
>>>>> 
>>>> 
>>> --
>>> Message  protected by MailGuard: e-mail anti-virus, anti-spam and content
>>> filtering.http://www.mailguard.com.au/mg
>>> Click here to report this message as spam:
>>> https://login.mailguard.com.au/report/1BZveI1wri/4izG2DWUCf9OUvbAh9DkfT/0
>>> 
>> 


Mime
View raw message