couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benoit Chesneau <bchesn...@gmail.com>
Subject Re: Full text search - is it coming? If yes, approx when.
Date Mon, 28 Mar 2011 15:59:33 GMT
Would be cool to have a nifs integration of Apache Lucy. It may solve
the problem.

- benoit

On Mon, Mar 28, 2011 at 5:17 PM, Olafur Arason <olafura@olafura.com> wrote:
> I love the power of Lucene but it's not needed for many usecases
> and can even be gutted like Cloudant is doing with their search
> using the lexer from Lucene.
>
> But most of the time people need quick and dirty search and
> even search integration with views. Then you would maybe have
> a really simple lexer. And have it built in. If people need more
> power they would use Lucene.
>
> It's like using a Ferrari to go to the store, it's cool but an overkill.
>
> Hope you keep up the good work, couchdb-lucene is really easy
> to use.
>
> Regards,
> Olafur Arason
>
> Ps I was talking to an NLP expert and I realize that there is so
> much to searching. Especially doing it right that I think nobody
> will be able to re-implement Lucene anytime soon.
>
> On Mon, Mar 28, 2011 at 14:30, Robert Newson <robert.newson@gmail.com> wrote:
>> I am a CouchDB committer and author of couchdb-lucene. :)
>>
>> B.
>>
>> On 28 March 2011 10:44, Andrew Stuart (SuperCoders)
>> <andrew.stuart@supercoders.com.au> wrote:
>>> Hi Robert
>>>
>>> "there are no publicly known plans to build a native full-text indexing
>>> feature for CouchDB."
>>>
>>> I don't know who is who around here as yet - are you commenting from inside
>>> knowledge or as an end user/developer?
>>>
>>> Thanks
>>>
>>>
>>> On 28/03/2011, at 8:24 PM, Robert Newson wrote:
>>>
>>> I have to dispute "There does not seem to be much understanding that
>>> this could be a killer feature."
>>>
>>> Obviously full-text search is a killer feature, but it's trivially
>>> available now via couchdb-lucene or elasticsearch.
>>>
>>> What people are asking for is native full-text search which, to me, is
>>> essentially asking for an Erlang port of Lucene. We'd love this, but
>>> it's a huge amount of work. Continually asking others to do
>>> significant amounts of work is also wearying.
>>>
>>> To replace a Lucene-based solution and match its quality and breadth
>>> is a huge chunk of work and is only necessary to satisfy people who,
>>> for various reasons, don't want to use Java.
>>>
>>> To answer the original post, there are no publicly known plans to
>>> build a native full-text indexing feature for CouchDB.
>>>
>>> B.
>>>
>>> On 28 March 2011 10:15, Olafur Arason <olafura@olafura.com> wrote:
>>>>
>>>> There does not seem to be much understanding that this could be a killer
>>>> feature. People are now relying on Lucene which monitors the _changes
>>>> feed.
>>>>
>>>> Cloudant has done it's own implementation which I gather through the
>>>> information they have published makes a view out of all your word,
>>>> they recommend java view because you can then reuse the lexer from
>>>> Lucene. Then I think they are reusing the reader of the view to make
>>>> their query. They have a similar syntax as Lucene for the query interface.
>>>> They are still working on this and I think they don't have that much
>>>> incentive to opensource it right away. But they have in past both
>>>> opensourced there technology like BigCouch so I think it's more a
>>>> matter of when rather then if.
>>>>
>>>> I think this is a good solution for a fulltext search. But I don't think
>>>> that
>>>> the java view does not have direct access to the data so it could be
>>>> slow. But cloudant does clustering on view generation so that helps.
>>>>
>>>> But there is also general problem with the current view system where
>>>> search technology could be used.
>>>>
>>>> The view are really good at sorting but people are using them to
>>>> do key matches which they are not designed for. They beginkey and
>>>> endkey are for sorting ranges and are not good for matching which
>>>> most resources online are pointing to.
>>>>
>>>> For example when you do:
>>>> beginkey = ["key11", "key21"]
>>>> endkey = ["key19", "key21"]
>>>>
>>>> You get ["key11","key22"], ["key11", "key23"] ... ["key12","key21"],
>>>> ["key12","key22"]...
>>>> which makes sense when looking up sorting ranges but not using it to
>>>> match keys. But you can have a range match lookup but only on the
>>>> last key and never on two keys. So this would work:
>>>>
>>>> beginkey = ["key21", "key11"]
>>>> endkey = ["key21", "key19"]
>>>>
>>>> The current view interface could be augmented to accept queries
>>>> and could make them much more powerful then they currently are
>>>> and just using the keys for sorting and selecting which values you
>>>> want shown which they are designed to do and do really well.
>>>>
>>>> This would be a killer feature and could use the new infrastructure
>>>> from Cloudant search.
>>>>
>>>> And don't tell me the Elastic or Lucene interface could do anything
>>>> close to this :)
>>>>
>>>> Regards,
>>>> Olafur Arason
>>>>
>>>> On Mon, Mar 28, 2011 at 04:31, Andrew Stuart (SuperCoders)
>>>> <andrew.stuart@supercoders.com.au> wrote:
>>>>>
>>>>> It would be good to know if full text search is coming as a core feature
>>>>> and
>>>>> if yes, approximately when - does anyone know?
>>>>>
>>>>> Even an approximate timeframe would be good.
>>>>>
>>>>> thanks
>>>>>
>>>>
>>> --
>>> Message  protected by MailGuard: e-mail anti-virus, anti-spam and content
>>> filtering.http://www.mailguard.com.au/mg
>>> Click here to report this message as spam:
>>> https://login.mailguard.com.au/report/1BZveI1wri/4izG2DWUCf9OUvbAh9DkfT/0
>>>
>>
>

Mime
View raw message