lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nader Henein <...@bayt.net>
Subject Re: Lucene Arabic Internationalization Question
Date Fri, 27 May 2005 21:19:42 GMT
Dear Rasha,

Sorry for the delay, I've indexed Arabic and English seamlessly on 
Lucene, the only thing you have to watch out for is stemming, as for 
indexing PDFs, I have not used that part of the API, but from 
experience, this comes down to using or in some cases forcing the 
correct encoding, debug this by bringing down your development to the 
lowest denominator, for example if you're doing this from a webservice, 
try it first from the prompt, so you have to contend only with the OS 
encoding (UTF-8 is highly recommended) and not the browser / server  
encodings.

A more detailed example of the problem you're facing would help me 
understand the problem more.

Nader

Rasha wrote:

>Dear Nader,
>
>I Have a big problem during indexing pdfs containing Persian Word
>
>lucenePDFIndexer cannot index it , and indexed words of pdf are unuseable
>
>
>is there a way to perform it to index good?
>
>
>regards,
>rasha malek
>
>
>
>
>
>
>  
>

-- 

Nader S. Henein
Senior Applications Architect

Bayt.com





---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message