lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: what if my database data contains other language (like danish, german).
Date Mon, 11 May 2009 15:13:43 GMT
Yes.  Lucene can handle that.  You have to select which stemmer to use.  You
may have to improve the German and Danish stemmers a little bit.

You may also have some issues with the fact that if Danish is 5% of your
corpus, then words that occur in 100% of your Danish documents will tend to
have too high weights since they only occur in 5% of your documents.  Any
term that occurs in more than 20% of a sub-corpus should generally be
discarded from your query.  This can be difficult in multi-lingual
situations.

For a first pass, I would ignore this issue, however.

On Mon, May 11, 2009 at 4:07 AM, uday kumar maddigatla <ukma@mach.com>wrote:

> what if my database data contains other language (like danish, german).
>
> Is Lucene will handle that .
>
> If yes How?
>



-- 
Ted Dunning, CTO
DeepDyve

111 West Evelyn Ave. Ste. 202
Sunnyvale, CA 94086
www.deepdyve.com
858-414-0013 (m)
408-773-0220 (fax)

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message