The best Analyzer documentation so far is Erik Hatcher's "Parser Rulez"
article. Link is under Resources page on Lucene's site.
Looking forward to the contribution.
Otis
--- karl wettin <kalle@snigel.dnsalias.net> wrote:
>
> Hello list,
>
> I'm Karl, and I just started testing Lucene the other day. It's a
> great
> core engine, but feel there are some things missing I'd be happy to
> contribute with.
>
> I stated with writing a simple N-gram classifier to detect language
> of
> a text in order to automatically cluster documents by langauge. The
> algorithm is very similair to the "TextCat" C-libray.
>
> And then I though, maybe it would be possible to use the same N-gram
> classifier to make an automatic stemmer that works on all languages.
> Hopefully I'll have something up and running for tests by next
> weekend.
>
> The same classifier could be used for a simple metaphone index.
>
> However, I need some help on understanding the Analyzer. Where can I
> find some tutorials on how to write my own? I didn't check with
> Google,
> maybe I should before posting here. Since the stemmer (and metaphone)
> data would have to be indexed in their own field(?) querying the
> stemmed
> would require one to stem the query too. Can I create a subclass of
> Query (or so), or do I need to create my own Query-class that handles
> the stemming all the way for the user? The last option is my current
> approach, so I would appreciate some hints and pointers here.
>
>
> Great project!
>
>
> karl
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
|