lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Koji Sekiguchi" <koji.sekigu...@m4.dion.ne.jp>
Subject Highlighter apply to Japanese
Date Tue, 06 Sep 2005 02:22:05 GMT
Hi again,

I'm using highlighter to highlight terms in Japanese text,
but I cannot get preferable output.

If I use StandardAnalyzer or SnowballAnalyzer w/ English,
getBestFragment() returns preferable outputs:

Sample: (SnowballAnalyzer)
Text: A meeting will be held in the City Hall
TokenStream:
[a][meet][will][be][held][in][the][citi][hall]
Query Text: meet
Output: A <B>meeting</B> will be held in the City Hall

But if I use JapaneseAnalyzer, which is most popular Analyzer
in Japan to get TokenStream from Japanese text, to highlight
Japanese text with Highlighter, whole text is highlighted:

Sample: (JapaneseAnalyzer)
Text: AMeetingWillBeHeldInTheCityHall
TokenStream:
[A][Meeting][Will][Be][Held][In][The][City][Hall]
Query Text: Meeting
Output: <B>AMeetingWillBeHeldInTheCityHall</B>

Please note that I use alphabet to show the Text at second sample
because most users in this mailing list can read it, but in reality,
I used Japanese characters for the Text. And you'll see that
JapaneseAnalyzer,
which uses Japanese dictionary on background to extract tokens
from text stream, can recognize tokens and produce TokenStream.
But highlighter.getBestFragment() highlighted whole text.

Do I need to implement Fragmenter to highlight tokens correctly
for Japanese text?

Thanks in advance,

Koji




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message