lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Balmain <dbalmain...@gmail.com>
Subject Re: Implementation in C & Some Questions
Date Sat, 12 Nov 2005 07:08:08 GMT
Hi Robert,

I'm very interested in this. I've ported the indexing part of Lucene
to C myself. Currently it's not portable (runs on *nix), but it does
implement file locking. I'm mostly curious to see how you solved some
of the problems I came across and how your performance is compared to
the java version. I'm probably not going to put mine under an apache
style licence but you are free to have a look at it. If you have
subversion;

svn co svn://www.davebalmain.com/cferret/trunk

Regards,
Dave

On 11/12/05, Robert Kirchgessner <rokirch@gmx.net> wrote:
> Hi,
>
> please excuse me if I'm completely wrong here,
> I know there is a Lucene4c in Incubator, but there
> seems to be not much traffic on its mailing list.
>
> First I want to thank all people involved in the
> project for this great software.
>
> I've made a port of Lucene to C, we use it in a
> corporate environment as PHP-module and as a
> standalone CGI in pure C for indexing and searching.
> It's a great success. The code is Open Source
> (Apache License).
>
> The current implementation supports following:
>
> - indexing and searching in pure C
> - binary storage
> - omit norms
> - portable, runs on Linux, Windows, Mac, .... (it's ANSI C ... I hope)
> - unit tested code
> - no known memory leaks ( used valgrind for checking )
>
> It lacks some features:
>
> - Unicode support (was a deliberate decision, but may be fixed in future)
> - no thread support
> - no file locking
> - no publicly available QueryParser (we've written some highly specific
> version for our corporate needs, but it will be an easy task for us to
> write a version matching Java-Lucene. We use re2c for regular
> expressions and lemon parser generator)
> - as we use Lucene in German environment, there is no support
> for other languages.
> - no span, range, wildcard and fuzzy search yet.
>
> Now I am updating the code to the latest development version of
> Java Lucene. By the way I introduce consequent memory management
> with apr-pools (APR - Apache Portable Runtime -> it's great! ),
> apr-like error handling with apr_status_t and apr-like code conventions.
>
> So here are my questions:
>
> 1. Is someone interested in this? If so, what's the best way to share
> sources? Some very early version of project is in SourceForge. I may
> checkin current sources there.
>
> 2. I'd like to keep the developement of the C-code in sync with Java.
> As we use this C-library very hevily at our company, we get
> some ideas for extending the search engine, some of them are:
>
> - support for separatly stored fields, e.g. like norms in the current
> implementation
> - support for binary fields of fixed length (e.g. for purposes of sorting,
> numerical comparison, optimization of memory consumption and
> fast file access)
>
> Further I've got many questions considering Java implementation like:
>
> - Why storing tokenized, binary and compressed flags in field data
> instead of in field info as global field attributes? In case where
> this attributes are constant for a field it consumes a byte per document,
> which could be saved, if stored in field info file.
>
> - Why the assumption that NO_NORMS for a field implies that the
> field is not tokenized with an analyzer:
>
> >    /** Index the field's value without an Analyzer, and disable
> >    * the storing of norms.  No norms means that index-time boosting
> >   * and field length normalization will be disabled.  The benefit is
> >     * less memory usage as norms take up one byte per indexed field
> >     * for every document in the index.
> >     */
> >    public static final Index NO_NORMS = new Index("NO_NORMS");
>
> We have many use cases in our applications where we omit norms while
> tokenizing fields (e.g. think about use case like retrieving hits using
> custom sorting depending on some field).
>
> Can we discuss such questions in this mailing list? If these discussions
> result in some decisions, it would be no problem for me to implement
> some ideas in Java.
>
> Thank you very much in advance,
>
> Robert Kirchgessner
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message