lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Grant Ingersoll" <gsing...@syr.edu>
Subject Re: Devnagari Search?
Date Thu, 10 Jun 2004 12:49:36 GMT
Don't have experience with those particular languages, but I can tell you that dealing with
UNICODE is just a matter of making sure you read in the input using the correct encoding.
 Java will take care of the rest.  If you are using a Reader for your Field, you probably
have to do something like:

new InputStreamReader(new FileInputStream(file), "UTF-8")

assuming your files are stored in UTF-8.  If they are a different encoding, then you will
have to pass that in place of UTF-8.

I would do a google search for stemmers and tokenizers for the languages you are interested
in.  I also believe someone had a "generic" stemmer that performed very well.  I believe they
posted to this list a week or so ago w/ a topic of "Writing a stemmer" or something along
those lines.

>>> satishk@it.iitb.ac.in 06/10/04 01:34AM >>>

Any one have built lucene for Devnagari UNICODE search? PLZ help me wht 
kind of changes i have to do in lucene.

Also if any one have built StandardTokenizer,Analyzer,Stemmer,Indexer
,queryParser for Hindi & Marathi Plz let me know.

Thanks,
Satish.


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org 
For additional commands, e-mail: lucene-user-help@jakarta.apache.org 



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message