lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Safarnejad, Ali (AFIS)" <Ali.Safarne...@fao.org>
Subject RE: Aramorph Analyzer
Date Thu, 16 Dec 2004 14:37:17 GMT
The Search query, as I mentioned in my previous email, looks like this:
"Al>jh___zp riyAsiy~ munaZ~im munaZ~am"
In fact, all the individual words are in the index, however, the complete
phrase, in double "quoutes", does not match.  Neither does any other phrase
that contains ambiguous stems. And that's the problem.



-----Original Message-----
From: Andrzej Bialecki [mailto:ab@getopt.org] 
Sent: 16 December 2004 14:35
To: Lucene Users List
Subject: Re: Aramorph Analyzer


Safarnejad, Ali (AFIS) wrote:
> Actually, one thing worth mentioning about the search, is when 
> searching for whole phrases, if there is any ambiguous words in the 
> phrase, then the Search fails to find the document, even if the phrase 
> was copied and pasted from the original document. So for example, I 
> have a document containing this phrase: الأجهـــزة الرياسية للمنظمة
> The first two words only have one stem, but the last word has two stems:
> munaZ~im AND munaZ~am,
> So the entire search query becomes: "Al>jh___zp riyAsiy~ munaZ~im munaZ~am"
> Which fails to find any matching documents.
> Whereas, a search for "Al>jh___zp riyAsiy~" would succeed.
> Even placing the accent over the ZAH (ظ), will not disambiguate the search.
> Has anyone found a workaround for this?

Although my knowledge of Arabic is equal to zero, I suggest that you 
should see how your query looks like after it is parsed 
(Query.toString()), and then compare it to the terms that are actually 
stored in the index. There is a chance that you e.g. apply the stemmer 
twice by using incorrect analyzer, or don't add the stemmed terms to the 
index, or similar. I suggest using Luke (http://www.getopt.org/luke) to 
diagnose your problem - in the "Search" tab you can also view the final 
query terms.


-- 
Best regards,
Andrzej Bialecki
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web ___|||__||  \|  ||
|  Embedded Unix, System Integration http://www.sigram.com  Contact: info at
sigram dot com


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Mime
View raw message