lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Koji Sekiguchi" <koji.sekigu...@m4.dion.ne.jp>
Subject RE: Highlighter apply to Japanese
Date Tue, 06 Sep 2005 08:45:07 GMT
I added some code you advised and the result is as follows:

Text: AaaBCcDdEFGgHhIiJKkLMmN

	Pos	start	end
	Inc	Ofst	Ofst
[Aaa]	1	0	3
[B]	1	3	4
[Cc]	1	4	6
[Dd]	1	6	8
[E]	1	8	9
[F]	1	9	10
[Gg]	1	10	12
[Hh]	1	12	14
[Ii]	1	14	16
[J]	1	16	17
[Kk]	1	17	19
[L]	1	19	20
[Mm]	1	20	22
[N]	1	22	23

Output:
<B>AaaBCcDdEFGgHhIiJKkLMmN</B>

It seems JapaneseAnalyzer produces correct tokens
to me.

Any thoughts?

Koji

> -----Original Message-----
> From: markharw00d [mailto:markharw00d@yahoo.co.uk] 
> Sent: Tuesday, September 06, 2005 3:37 PM
> To: java-user@lucene.apache.org
> Subject: Re: Highlighter apply to Japanese
> 
> 
> I don't know the behaviour of the Japanese Analyzer you are using.
> Can you add to your example diagnosis the Token.getPositionIncrement, 
> Token.startOffset and Token.endOffset for each of the tokens?
> 
> The highlighter groups tokens with overlapping start and end offsets 
> into a single TokenGroup for the purposes of highlighting. 
> This allows 
> TokenStreams which produce multiple synonyms for the same 
> source token 
> to work. This behaviour was also required to get the CJKAnalyzer to 
> work. It could be that the Analyzer you are using is 
> producing a stream 
> of tokens which *all* overlap?
> 
> Cheers
> Mark
> 
> 
> 		
> ___________________________________________________________ 
> To help you stay safe and secure online, we've developed the 
> all new Yahoo! Security Centre. http://uk.security.yahoo.com
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message