incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matt Goodall <matt.good...@gmail.com>
Subject Re: FTI engine by Joe
Date Thu, 10 Dec 2009 16:25:05 GMT
2009/12/10 Norman Barker <norman.barker@gmail.com>:
> I have been following http://lucene.apache.org/lucy/ this seems like
> an active development
> (http://mail-archives.apache.org/mod_mbox/lucene-lucy-dev/) and I was
> wondering about using it as a replacement for Java Lucene.
>
> Might be useful to have a trade-off discussion about all approaches,
> Java Lucene, Lucy, and Erlang FTI.

For completeness, I've been using Xapian to index CouchDB docs,
although I've not integrated the FTI into the CouchDB server in the
way others do.

I index docs in response to _changes updates (via a Python process)
and query direct from my application process(es). Sometimes I use the
Xapian search results directly; sometimes I go on to ask CouchDB for
the matching documents by doing a bulk get of the doc ids from the
search. Generally speaking that works quite nicely although it does
need a bit of setting up.

It would be really interesting to have a FTI built into CouchDB in
some way but, personally, I think it should be relatively lightweight
and Erlang-based.

- Matt


>
> Norman
>
> On Thu, Dec 10, 2009 at 3:33 AM, Robert Dionne
> <dionne@dionne-associates.com> wrote:
>> I think so, and have already done some integration work [1] using an earlier version
of some of this from the Erlang book. There are still lots of design questions such as what
to index and how to specify what to index, how to store indices and how to interact with the
view servers, etc..
>>
>> There is also already very solid FTI support for couchdb using Lucene[2]. Lucene
is very mature and proven and widely used so this fits many uses cases.
>>
>> There are a lot of other goodies in this elib1, It's definitely worth a read for
erlang programmers.
>>
>> Cheers,
>>
>> Bob
>>
>>
>> [1] http://github.com/bdionne/indexer
>> [2] http://github.com/rnewson/couchdb-lucene/
>>
>>
>>
>> On Dec 9, 2009, at 7:28 PM, Senthilkumar Peelikkampatti wrote:
>>
>>> I looked at the Joe's mail about suite of libraries and one of them is
>>> FTI. Will this fit in couchdb's full text requirement?
>>>
>>> ---------- Forwarded message ----------
>>> From: Joe Armstrong <erlang@gmail.com>
>>> Date: Wed, Dec 9, 2009 at 2:12 PM
>>> Subject: [erlang-questions] Announce: elib1
>>> To: Erlang <erlang-questions@erlang.org>
>>> Cc: zabrane3@gmail.com, dionne@dionne-associates.com
>>>
>>>
>>> Announcing elib1
>>>
>>> Elib1 was released today.
>>>
>>> Tomorrow I will present it at the Stockholm Erlounge.
>>>
>>> Elib1 is a library of Erlang modules and set of applications which use
>>> the modules.
>>>
>>> The Elib1 project now moves into phase 2
>>>
>>> The phases of the project are:
>>>
>>>    Phase 1: Define and implement a basic structure
>>>             and a small number of applications
>>>    Phase 2: Make project open source
>>>    Phase 3: Write books
>>>
>>> Each phase will take about 2-3 years.
>>>
>>> The first attempt at a library contains modules for the following:
>>>
>>>    xml parsing
>>>    fast tuple I/O (to disk)
>>>    full-text indexing
>>>    http parsing
>>>    telnet server
>>>    json parsing
>>>    porter stemming
>>>    mysql native interface
>>>    sha1
>>>    similar file locator
>>>    screen manipulation
>>>    miscellaneous missing functions (which should be in the standard libraries)
>>>    accurate tagging of Erlang so it can be turned into browsable HTML
>>>    (and more ...)
>>>
>>> The applications are divided it two areas. Supported and unsupported
>>>
>>> In supported:
>>>
>>>    indexer      - a full text indexing engine (this is the of near
>>> production quality)
>>>    irc          - and irc kit (includes a TCL wish interface)
>>> (somewhat incomplete)
>>>    tagger       - an application to turn erlang into browsable HTML
>>>    drivers      - example linked in and port drivers (currently broken)
>>>    midi_drivers - mac os X only
>>>    website      - a webserver (used internally)
>>>    versions     - a way of munging module names to make them secure
>>>
>>> In unsupported:
>>>
>>>   epeg     - a peg grammar and parser combinators
>>>   folding  - Javascript folding editor/organiser (needs some work,
>>> not erlang :-)
>>>   jpeg     - image transformation in Erlang
>>>   xml      - some xml stuff
>>>
>>> I have attempted to use "best practise" in making the library. Using
>>> the dialyzer, eunit and edoc.
>>>
>>> This code is far from perfect or polished - but the basic way things
>>> fit together
>>> is defined.
>>>
>>> Rather than have 500 small libraries each with a few users and a few
>>> routines I'd
>>> like to see one library with a much large number of tightly integrated routines.
>>>
>>> The code is available at:
>>>
>>> http://github.com/joearms/elib1
>>>
>>> /Joe Armstrong
>>>
>>> ________________________________________________________________
>>> erlang-questions mailing list. See http://www.erlang.org/faq.html
>>> erlang-questions (at) erlang.org
>>>
>>>
>>>
>>>
>>> --
>>> Regards,
>>> Senthilkumar Peelikkampatti,
>>> http://pmsenthilkumar.blogspot.com/
>>
>>
>

Mime
View raw message