lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis_gospodne...@yahoo.com>
Subject Re: Supported Languages
Date Thu, 10 Jun 2004 15:44:00 GMT
Hello Don,

--- Don Vaillancourt <donv@webimpact.com> wrote:
> HI all,
> 
> I've noticed from the documentation that Russian and German languages
> are 
> supported by Lucene, but does Lucene support the french language.

It does, but classes for supporting French are not included in the
Lucene core, which is what you downloaded, most likely.

> What is the definition of support in regards to language for Lucene? 
> Being 
> able to index a document?  Or being able to search a document?  Or is
> it simply being able to sort results?

Pretty much any text can be indexed with Lucene, but in order to index
a text properly, one has to tokenize the input properly (i.e. break
input into indexable units, like words).  In the process, the text has
to be encoded properly using Java's character encoding mechanisms. 
Optionally, one may want to have support for stemming (cutting (hah!)
tokens down to their roots), etc.

If you want to index and search French text, I suggest you look at the
Snowball Analyzers section of Lucene Sandbox.  You will have to check
that out of CVS yourself and build a Jar yourself though, as we
currently don't have a mechanism that does this in place.

I hope this helps.

Otis

=====
http://www.simpy.com/ - social bookmarking and personal search engine

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message