lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Itamar Syn-Hershko" <ita...@divrei-tora.com>
Subject RE: Language identification ??
Date Fri, 14 Mar 2008 14:30:39 GMT

For what it worths, I did something similar in my BidiAnalyzer so I can
index both Hebrew/Semitic texts and English/Latin words without switching
analyzers, giving each the proper treatment. I did it simply by testing the
first char and looking at its numeric value - so it falls between Hebrew
Aleph and Taph then its Hebrew, else its Latin. I wonder how you would spot
a French word in an English text for instance (aren't there parallel words?)

Itamar.

-----Original Message-----
From: Grant Ingersoll [mailto:gsingers@apache.org] 
Sent: Friday, March 14, 2008 3:34 PM
To: java-user@lucene.apache.org
Subject: Re: Language identification ??

I think Karl Wettin has one that is a patch in JIRA.  Try searching there.

On Mar 14, 2008, at 1:28 AM, Raghu Ram wrote:

> Hi all,
>  I guess this question is  a bit off the track. Are there any language 
> identification modules inside Lucene ??? If not can somebody please 
> suggest me a good one.
>
> Thank You.



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org





---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message