lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wettin <karl.wet...@gmail.com>
Subject Re: lucene for Arabic and Urdu
Date Tue, 18 Sep 2007 21:38:21 GMT

18 sep 2007 kl. 23.23 skrev Liaqat Ali:

> I m new to the field of Information Retrieval and now working to  
> develop  search engine for language like Arabic  and Urdu. Kindly  
> guide me in this regard that how can Lucene be utilized for this  
> purpose.

Lucene makes no distinction between languages. All data is discrete  
chunks of characters, also known as tokens. Tokens are repsresented  
in fields, and the combination of a token in a specific field is  
known as a term. What tokens your index end up containing depends on  
the analyzer strategy you will be using. An analyzer could be  
language sensitive, it could also be something completely different.

> Can anybody tell me exactly what I should do to design a search  
> engine from the scratch using Lucene.

You need to define what your search engine is supposed to do in order  
to get an answer that makes sense.


Lucene in action is a pretty good book, even though it covers 1.4 or  
so. The SVN contains a demo application. There is also the Wiki and  
this forum.

-- 
karl

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message