lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <gsing...@apache.org>
Subject Re: lucene farsi problem
Date Thu, 01 May 2008 13:28:37 GMT

On May 1, 2008, at 4:36 AM, esra wrote:

>
> Hi,
>
> document's encoding is "UTF-8".
>
> i tried the  explain() method and the result for "د-ژ"  range  
> searching is:
>
>  fieldWeight(keywordIndex:ساب وو�ر in 0),  
> product of:
>  1.0 = tf(termFreq(keywordIndex:ساب وو�ر)=1)
>  0.30685282 = idf(docFreq=1)
>  1.0 = fieldNorm(field=keywordIndex, doc=0)
>
> here keywordIndex is "ساب ووفر".
>
> i also  installed the "luke.jnlp"  but i don't know what to check by  
> Luke.
>


http://wiki.apache.org/lucene-java/LuceneFAQ#head-3558e5121806fb4fce80fc022d889484a9248b71

Luke can be used to view your index.  Not saying it is your problem  
here, but often times when I get back results that "seem" incorrect,  
the first thing I do is go look at my index using Luke, and compare  
the "incorrect" document with what is in the query to see where the  
(mis)match is occurring.   Usually, this analysis shows that my  
document/query is not what I thought it was.

Luke can browse documents and parse queries, amongst other useful  
things.




> Thanks,
>
> Esra
>
>
>
> Grant Ingersoll-6 wrote:
>>
>> I am not sure how Standard Analyzer will perform on Farsi.  The thing
>> to do now would be to get Luke and have a look at the actual document
>> that matches and see what it's tokens look like.  You might also try
>> using the explain() method to see why that document matches.
>>
>> Also, are you sure you are loading the file w/ the proper encodings,
>> etc?
>>
>> -Grant
>>
>> On Apr 30, 2008, at 8:06 AM, esra wrote:
>>
>>>
>>> Hi,
>>> thanks for your reply.
>>> I am using StandartAnalyzer now and my xml document is like below:
>>>
>>> <keyword><![CDATA[ساب ووفر]]></keyword>
>>>     <description><![CDATA[یک ووفر که در محفظه ای
>>> جدا از سایر درایور ها
>>> قرار دارد تا صدایی با باس فوق العاده
>>> پایین تولید کند. ]]></description>
>>>
>>> i googled for farsi analyzer and found nothing also i am not sure it
>>> if
>>> would solve my problem or not.
>>>
>>> Thanks,
>>>
>>> Esra
>>>
>>>
>>> Grant Ingersoll-6 wrote:
>>>>
>>>> What Analyzer are you using?  You might try looking in Luke to see
>>>> what is in your index, etc.  It also isn't clear to me what your
>>>> documents look like.
>>>>
>>>> As for a Farsi analyzer, I would Google "Farsi analyzer Lucene" and
>>>> see if you can find anything.  Otherwise, you will have to write  
>>>> your
>>>> own (and donate it????)
>>>>
>>>> -Grant
>>>>
>>>> On Apr 30, 2008, at 3:21 AM, esra wrote:
>>>>
>>>>>
>>>>> hi,
>>>>>
>>>>> i am using lucene's "IndexSearcher" to search the given xml by
>>>>> keyword which
>>>>> contains farsi information.
>>>>> while searching i use ranges like
>>>>>
>>>>> آ-ث  |  ج-خ  |  د-ژ  |  س-ظ  |  ع-ق  |  ک-ل  |  م-ی
>>>>>
>>>>> when i do search for  "د-ژ"  range the results are wrong , they
>>>>> are
>>>>> the
>>>>> results of  " س-ظ "range.
>>>>>
>>>>> for example when i do search for "د-ژ"  one of the the results  
>>>>> is
>>>>> "ساب ووفر"
>>>>> , this result also shown on the " س-ظ " range's result list  
>>>>> which
>>>>> is the
>>>>> corret range.
>>>>>
>>>>> As IndexSearcher use "compareTo" method and this method uses
>>>>> unicodes for
>>>>> comparing, i found the unicodes of the characters.
>>>>>
>>>>> د=U+62F
>>>>> ژ = U+698
>>>>> and the first letter of "ساب ووفر " is  س = U+633
>>>>>
>>>>> Do you have any idea how to solve this problem, there are  
>>>>> analyzers
>>>>> for
>>>>> different languages ,
>>>>> will this be usefull if so do you know where to find a farsi
>>>>> analyzer?
>>>>>
>>>>> I would bu glad if you help.
>>>>>
>>>>> thanks ,
>>>>>
>>>>> Esra
>>>>>
>>>>> -- 
>>>>> View this message in context:
>>>>> http://www.nabble.com/lucene-farsi-problem- 
>>>>> tp16977096p16977096.html
>>>>> Sent from the Lucene - Java Users mailing list archive at
>>>>> Nabble.com.
>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>
>>>>
>>>> --------------------------
>>>> Grant Ingersoll
>>>>
>>>> Lucene Helpful Hints:
>>>> http://wiki.apache.org/lucene-java/BasicsOfPerformance
>>>> http://wiki.apache.org/lucene-java/LuceneFAQ
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>
>>>>
>>>>
>>>
>>> -- 
>>> View this message in context:
>>> http://www.nabble.com/lucene-farsi-problem-tp16977096p16980977.html
>>> Sent from the Lucene - Java Users mailing list archive at  
>>> Nabble.com.
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>
>> --------------------------
>> Grant Ingersoll
>>
>> Lucene Helpful Hints:
>> http://wiki.apache.org/lucene-java/BasicsOfPerformance
>> http://wiki.apache.org/lucene-java/LuceneFAQ
>>
>>
>>
>>
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>>
>
> -- 
> View this message in context: http://www.nabble.com/lucene-farsi-problem-tp16977096p16993174.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

--------------------------
Grant Ingersoll

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ







---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message