lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Itamar Syn-Hershko" <>
Subject RE: Language identification ??
Date Fri, 14 Mar 2008 14:30:39 GMT

For what it worths, I did something similar in my BidiAnalyzer so I can
index both Hebrew/Semitic texts and English/Latin words without switching
analyzers, giving each the proper treatment. I did it simply by testing the
first char and looking at its numeric value - so it falls between Hebrew
Aleph and Taph then its Hebrew, else its Latin. I wonder how you would spot
a French word in an English text for instance (aren't there parallel words?)


-----Original Message-----
From: Grant Ingersoll [] 
Sent: Friday, March 14, 2008 3:34 PM
Subject: Re: Language identification ??

I think Karl Wettin has one that is a patch in JIRA.  Try searching there.

On Mar 14, 2008, at 1:28 AM, Raghu Ram wrote:

> Hi all,
>  I guess this question is  a bit off the track. Are there any language 
> identification modules inside Lucene ??? If not can somebody please 
> suggest me a good one.
> Thank You.

To unsubscribe, e-mail:
For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message