lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benson Margulies <ben...@basistech.com>
Subject Re: Confuse with Kuromoji
Date Sun, 06 Apr 2014 11:51:57 GMT
You must know what language each text is in, and use an appropriate
analyzer. Some people do this by using a separate field (text_eng,
text_spa, text_jpn). Other people put some extra information at the
beginning of the field, and then make an analyzer that peeks in order to
dispatch to the correct tokenizer.


On Sat, Apr 5, 2014 at 9:59 PM, <j7a42e4fd7qux@softbank.ne.jp> wrote:

> I am pretty new with Lucene, however I have not problem understanding what
> is about.
> My big problem is trying to understand how Kuromoji works. I need to
> implement a search functinality thats supports initially English, Spanish
> and Japanese. I doesn't seem to be a deal with the two firsts, as I can
> just use the analyzersーcommon to index both languages contents, but when it
> comes to Japanese it has it's own analyzer. I could't find any clues about
> combining analyzers, so I still don't if I can combine all languages under
> the same index (which would be ideal, as I expect mix searches in the
> context of my project) or I have to detect the language first and then
> index Japanese texts separately (what it will be a big disadvantage when it
> comes to mixed searches and future localization expansion).
> I found out about Lucene throgh Kuromoji, it will be great to find out a
> solution to be able to use all the greatness that Lucene offers.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message