lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Liaqat Ali <liaqatalim...@gmail.com>
Subject Indexing Non-English text
Date Tue, 04 Dec 2007 10:53:51 GMT
Hi,
I m facing a problem while indexing a small .txt file with Lucene. The 
file which i want to index with lucene is in Urdu language (varient of 
Arabic and Persian). But the Index i get is in Unicode form, not in the 
real form (original Urdu text). This program works good for a file in 
English language. This is the code i use for indexing..

        FileReader file = new FileReader ("urdoc.txt");
        BufferedReader buff = new BufferedReader(file);
        String line = buff.readLine();
        boolean eof = false;
        buff.close();
        String indexDir = "D:\\index";
               Analyzer analyzer = new StandardAnalyzer();
            boolean createFlag = true;
        IndexWriter writer =
                    new IndexWriter(indexDir, analyzer, createFlag);
            Document document  = new Document();
        document.add(new Field("fieldname",line, Field.Store.YES,
        Field.Index.TOKENIZED));
            writer.addDocument(document);
            writer.close();

Kindly guide me, what I should do, would i have to change this code or 
whatever else do you suggest?

Liaqat

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message