lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <gsing...@apache.org>
Subject Re: cross-lingual IR
Date Mon, 13 Aug 2007 15:01:14 GMT
Hi Farzad,

Hmmm, where to begin...  This is a tough question and one that  
warrants a fair amount of research.  I would start by taking a look  
at the TREC cross-language tracks and the CLEF conference.

I have used Lucene to index/search both the English and Arabic/French/ 
Spanish/Dutch/etc. documents.  In general, you need some way of  
transforming a source language query into a target language query OR  
you need some way of automatically translating all your documents to  
the same language.  How you do this is really the matter of research,  
eh?  The most basic approach to the query transformation problem is  
to use a dictionary to look up the terms from the source and get the  
target language equivalents.

As for Lucene, you will need an Analyzer that handles Persian (try  
googling "Persian Lucene Analyzer")  you may very well have to write  
your own.   The actual indexing and search tasks are relatively  
straightforward as Lucene tasks and there a number of good tutorials  
and books on how to do that.

Good luck,
Grant

On Aug 13, 2007, at 6:30 AM, Farzad Mahdikhani wrote:

>  Dear All,
>
>  I would like to implement a cross-lingual IR system with support  
> for Persian and English languages for an academic research task.  
> How can I use Lucene for my task? How shall I proceed? what are the  
> requirements?
>
>  Regards,
>  Farzad
>
> ---------------------------------
> Pinpoint customers who are looking for what you sell.

--------------------------
Grant Ingersoll
http://lucene.grantingersoll.com

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ



Mime
View raw message