lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Lu <chris...@gmail.com>
Subject Re: Highlighter apply to Japanese
Date Tue, 06 Sep 2005 06:52:34 GMT
Hi, Koji,

I had the same problem as you. This is because CJK's n-gram analysis
is different from single character's.

My get around is to use CJKHighlighter and CJKHighlightAnalyzer in sandbox.

-- 
Chris Lu
------------
Lucene Search RAD on Any Database
http://www.dbsight.net


On 9/5/05, Koji Sekiguchi <koji.sekiguchi@m4.dion.ne.jp> wrote:
> Hi again,
> 
> I'm using highlighter to highlight terms in Japanese text,
> but I cannot get preferable output.
> 
> If I use StandardAnalyzer or SnowballAnalyzer w/ English,
> getBestFragment() returns preferable outputs:
> 
> Sample: (SnowballAnalyzer)
> Text: A meeting will be held in the City Hall
> TokenStream:
> [a][meet][will][be][held][in][the][citi][hall]
> Query Text: meet
> Output: A <B>meeting</B> will be held in the City Hall
> 
> But if I use JapaneseAnalyzer, which is most popular Analyzer
> in Japan to get TokenStream from Japanese text, to highlight
> Japanese text with Highlighter, whole text is highlighted:
> 
> Sample: (JapaneseAnalyzer)
> Text: AMeetingWillBeHeldInTheCityHall
> TokenStream:
> [A][Meeting][Will][Be][Held][In][The][City][Hall]
> Query Text: Meeting
> Output: <B>AMeetingWillBeHeldInTheCityHall</B>
> 
> Please note that I use alphabet to show the Text at second sample
> because most users in this mailing list can read it, but in reality,
> I used Japanese characters for the Text. And you'll see that
> JapaneseAnalyzer,
> which uses Japanese dictionary on background to extract tokens
> from text stream, can recognize tokens and produce TokenStream.
> But highlighter.getBestFragment() highlighted whole text.
> 
> Do I need to implement Fragmenter to highlight tokens correctly
> for Japanese text?
> 
> Thanks in advance,
> 
> Koji
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message