lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis_gospodne...@yahoo.com>
Subject Re: Indexing .txt file containing english, german or french alphabet
Date Sun, 25 Sep 2005 23:33:29 GMT
For dealing with parsing + indexing RTF, see chapter 7 of Lucene in
Action.

For indexing text that has multiple languages.... I don't know what to
recommend.  Well, I do - try the StandardAnalyzer and see if that
produces satisfactory results, but you'd really need a smart analyzer
that knows how to properly tokenize and filter words from multiple
languages, and I haven't heard of anyone doing that here.

Otis

--- tirupathi reddy <tirupathireddy_m@yahoo.com> wrote:

> Hello,
>  
>    I have to index the text in the .txt document. This text document
> contains english characters , german characters etc. Please tell me
> how can I index that text document. Is the procedure of indexing RTF
> documents can be applied here?
>  
> thanx,
> MTREDDY
> 
> 
> 
> Tirupati Reddy Manyam 
> 24-06-08, 
> Sundugaullee-24, 
> 79110 Freiburg 
> GERMANY. 
> 
> Phone: 00497618811257 
> cell : 004917624649007
> 
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around 
> http://mail.yahoo.com 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message