lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Muir <rcm...@gmail.com>
Subject Re: Lucene and multi-lingual Unicode - advice needed
Date Mon, 15 Jun 2009 17:14:19 GMT
Hi,

(Since this is an issue you brought up on the Compass forums)

I wonder what stage you are in the development process?
Have you considered SolR, or does compass provide some other
functionality that you need?

The reason I say this, is because the easiest solution might be to use
a nightly SolR for your application.

I'm not personally biased one way or the other for any particular
framework, but recently there has been some improvements added to SolR
so that the default type 'text' is pretty good for multilingual
processing.

In fact I hope in the future it will be improved in lucene so that
your decision is really based upon other application needs...

On Mon, Jun 15, 2009 at 1:10 PM, OBender Hotmail<osya_bender@hotmail.com> wrote:
> Hi All!
>
>
>
> I'm new to Lucene so forgive me if this question was asked before.
>
> I have a database with records in the same table in many different languages
> (up to 70) it includes all W-European, Arabic, Eastern, CJK, Cyrillic, etc.
> you name it.
> I've looked at what people say about Lucene and it looks like for the most
> part standard analyzers should do fine with most Unicode languages but there
> are quite a few exceptions.
> Here is some recently updated Lucene Jira thread:
> https://issues.apache.org/jira/browse/LUCENE-1488
>
> My question is, what would be the safest bet for me in terms of
> analyzers/tokenizers?
> Do I really have to write my own ones for the bunch of languages that are
> not supported?
> Did anyone already solve the problem similar to mine? I'm sure someone
> already did :)
>
> And yes, I looked at the Lucene sandbox analyzers. It just adds more
> confusion. For example why there analyzers for DE and FR? Wouldn't the
> standard analyzer (which is Unicode complaint as I understood) deal with EU
> languages just fine?
>
> Thanks in advance for advices :)
>
>



-- 
Robert Muir
rcmuir@gmail.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message