lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis_gospodne...@yahoo.com>
Subject Re: N-gram layer
Date Sun, 01 Feb 2004 21:12:32 GMT
The best Analyzer documentation so far is Erik Hatcher's "Parser Rulez"
article.  Link is under Resources page on Lucene's site.

Looking forward to the contribution.

Otis


--- karl wettin <kalle@snigel.dnsalias.net> wrote:
> 
> Hello list,
> 
> I'm Karl, and I just started testing Lucene the other day. It's a
> great
> core engine, but feel there are some things missing I'd be happy to
> contribute with. 
> 
> I stated with writing a simple N-gram classifier to detect language
> of
> a text in order to automatically cluster documents by langauge. The 
> algorithm is very similair to the "TextCat" C-libray. 
> 
> And then I though, maybe it would be possible to use the same N-gram 
> classifier to make an automatic stemmer that works on all languages. 
> Hopefully I'll have something up and running for tests by next
> weekend.
> 
> The same classifier could be used for a simple metaphone index.
> 
> However, I need some help on understanding the Analyzer. Where can I
> find some tutorials on how to write my own? I didn't check with
> Google,
> maybe I should before posting here. Since the stemmer (and metaphone)
> data would have to be indexed in their own field(?) querying the
> stemmed
> would require one to stem the query too. Can I create a subclass of 
> Query (or so), or do I need to create my own Query-class that handles
> the stemming all the way for the user? The last option is my current
> approach, so I would appreciate some hints and pointers here.
> 
> 
> Great project! 
> 
> 
> karl
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message