lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrzej Bialecki ...@getopt.org>
Subject Re: Aramorph Analyzer
Date Thu, 16 Dec 2004 13:34:50 GMT
Safarnejad, Ali (AFIS) wrote:
> Actually, one thing worth mentioning about the search, is when searching for
> whole phrases, if there is any ambiguous words in the phrase, then the Search
> fails to find the document, even if the phrase was copied and pasted from the
> original document.
> So for example, I have a document containing this phrase: الأجهـــزة الرياسية
> للمنظمة
> The first two words only have one stem, but the last word has two stems:
> munaZ~im AND munaZ~am,
> So the entire search query becomes: "Al>jh___zp riyAsiy~ munaZ~im munaZ~am"
> Which fails to find any matching documents.
> Whereas, a search for "Al>jh___zp riyAsiy~" would succeed.
> Even placing the accent over the ZAH (ظ), will not disambiguate the search.
> Has anyone found a workaround for this?

Although my knowledge of Arabic is equal to zero, I suggest that you 
should see how your query looks like after it is parsed 
(Query.toString()), and then compare it to the terms that are actually 
stored in the index. There is a chance that you e.g. apply the stemmer 
twice by using incorrect analyzer, or don't add the stemmed terms to the 
index, or similar. I suggest using Luke (http://www.getopt.org/luke) to 
diagnose your problem - in the "Search" tab you can also view the final 
query terms.


-- 
Best regards,
Andrzej Bialecki
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message