lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From esra <esraer...@gmail.com>
Subject Re: lucene farsi problem
Date Thu, 01 May 2008 08:36:32 GMT

Hi,

document's encoding is "UTF-8".

i tried the  explain() method and the result for "د-ژ"  range searching is:

  fieldWeight(keywordIndex:ساب وو�ر in 0), product of:
  1.0 = tf(termFreq(keywordIndex:ساب وو�ر)=1)
  0.30685282 = idf(docFreq=1)
  1.0 = fieldNorm(field=keywordIndex, doc=0)

here keywordIndex is "ساب ووفر".

 i also  installed the "luke.jnlp"  but i don't know what to check by Luke.

Thanks,

Esra



Grant Ingersoll-6 wrote:
> 
> I am not sure how Standard Analyzer will perform on Farsi.  The thing  
> to do now would be to get Luke and have a look at the actual document  
> that matches and see what it's tokens look like.  You might also try  
> using the explain() method to see why that document matches.
> 
> Also, are you sure you are loading the file w/ the proper encodings,  
> etc?
> 
> -Grant
> 
> On Apr 30, 2008, at 8:06 AM, esra wrote:
> 
>>
>> Hi,
>> thanks for your reply.
>> I am using StandartAnalyzer now and my xml document is like below:
>>
>> <keyword><![CDATA[ساب ووفر]]></keyword>
>>      <description><![CDATA[یک ووفر که در محفظه ای  
>> جدا از سایر درایور ها
>> قرار دارد تا صدایی با باس فوق العاده  
>> پایین تولید کند. ]]></description>
>>
>> i googled for farsi analyzer and found nothing also i am not sure it  
>> if
>> would solve my problem or not.
>>
>> Thanks,
>>
>> Esra
>>
>>
>> Grant Ingersoll-6 wrote:
>>>
>>> What Analyzer are you using?  You might try looking in Luke to see
>>> what is in your index, etc.  It also isn't clear to me what your
>>> documents look like.
>>>
>>> As for a Farsi analyzer, I would Google "Farsi analyzer Lucene" and
>>> see if you can find anything.  Otherwise, you will have to write your
>>> own (and donate it????)
>>>
>>> -Grant
>>>
>>> On Apr 30, 2008, at 3:21 AM, esra wrote:
>>>
>>>>
>>>> hi,
>>>>
>>>> i am using lucene's "IndexSearcher" to search the given xml by
>>>> keyword which
>>>> contains farsi information.
>>>> while searching i use ranges like
>>>>
>>>> آ-ث  |  ج-خ  |  د-ژ  |  س-ظ  |  ع-ق  |  ک-ل  |  م-ی
>>>>
>>>> when i do search for  "د-ژ"  range the results are wrong , they  
>>>> are
>>>> the
>>>> results of  " س-ظ "range.
>>>>
>>>> for example when i do search for "د-ژ"  one of the the results is
>>>> "ساب ووفر"
>>>> , this result also shown on the " س-ظ " range's result list which
>>>> is the
>>>> corret range.
>>>>
>>>> As IndexSearcher use "compareTo" method and this method uses
>>>> unicodes for
>>>> comparing, i found the unicodes of the characters.
>>>>
>>>> د=U+62F
>>>> ژ = U+698
>>>> and the first letter of "ساب ووفر " is  س = U+633
>>>>
>>>> Do you have any idea how to solve this problem, there are analyzers
>>>> for
>>>> different languages ,
>>>> will this be usefull if so do you know where to find a farsi  
>>>> analyzer?
>>>>
>>>> I would bu glad if you help.
>>>>
>>>> thanks ,
>>>>
>>>> Esra
>>>>
>>>> -- 
>>>> View this message in context:
>>>> http://www.nabble.com/lucene-farsi-problem-tp16977096p16977096.html
>>>> Sent from the Lucene - Java Users mailing list archive at  
>>>> Nabble.com.
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>
>>>
>>> --------------------------
>>> Grant Ingersoll
>>>
>>> Lucene Helpful Hints:
>>> http://wiki.apache.org/lucene-java/BasicsOfPerformance
>>> http://wiki.apache.org/lucene-java/LuceneFAQ
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>>
>>
>> -- 
>> View this message in context:
>> http://www.nabble.com/lucene-farsi-problem-tp16977096p16980977.html
>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
> 
> --------------------------
> Grant Ingersoll
> 
> Lucene Helpful Hints:
> http://wiki.apache.org/lucene-java/BasicsOfPerformance
> http://wiki.apache.org/lucene-java/LuceneFAQ
> 
> 
> 
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/lucene-farsi-problem-tp16977096p16993174.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message