lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Randy Darling" <rdarl...@imanage.com>
Subject RE: Any tools to detect language of document
Date Wed, 18 Jun 2003 18:23:48 GMT

Thanks for the information.

I did some quick tests of the ngramj program and it seems to work.

Randy

-----Original Message-----
From: Stephan.Strittmatter@gmx.de [mailto:Stephan.Strittmatter@gmx.de]
Sent: Wednesday, June 18, 2003 2:30 AM
To: Lucene Users List
Subject: RE: Any tools to detect language of document


Have also a look at ngramj at sourceforge.net. I use this Java library.

First I check the language-Meta tag of html page. If it is not avaiable I
use ngramj 
to "guess" it.

Probably this library could also be added to the Lucene Contributions List

Stephan

> look for Ted Dunning algorithm on the web.
> 
> 
> -neil
> 
> -----Original Message-----
> From: Randy Darling [mailto:rdarling@imanage.com]
> Sent: 17 juin, 2003 16:41
> To: Lucene Users List
> Subject: Any tools to detect language of document
> 
> 
> 
> I am attempting to come up with an automated way to
> select which language analyzer to use on a document.
> 
> Anyone know of any algorithms available to detect
> what language the document may be written in?
> 
> Are there any special Analyzers that attempt to support
> multiple languages?
> 
> 
> Thanks,
> Randy
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 

-- 
+++ GMX - Mail, Messaging & more  http://www.gmx.net +++
Bitte l├Ącheln! Fotogalerie online mit GMX ohne eigene Homepage!


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message