Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 73610 invoked from network); 2 Jul 2009 12:59:58 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 2 Jul 2009 12:59:58 -0000 Received: (qmail 40783 invoked by uid 500); 2 Jul 2009 13:00:06 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 40726 invoked by uid 500); 2 Jul 2009 13:00:05 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 40711 invoked by uid 99); 2 Jul 2009 13:00:05 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 02 Jul 2009 13:00:05 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [209.222.209.11] (HELO mirkwood.informatics.jax.org) (209.222.209.11) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 02 Jul 2009 12:59:56 +0000 Received: from [127.0.0.1] (corona [209.222.209.245]) by mirkwood.informatics.jax.org (8.14.2/8.14.2) with ESMTP id n62CxWfU016127 for ; Thu, 2 Jul 2009 08:59:33 -0400 (EDT) (envelope-from mhall@informatics.jax.org) Message-ID: <4A4CAF35.6050206@informatics.jax.org> Date: Thu, 02 Jul 2009 08:59:33 -0400 From: Matthew Hall User-Agent: Thunderbird 2.0.0.22 (Windows/20090605) MIME-Version: 1.0 To: java-user@lucene.apache.org Subject: Re: Highligheter fails using JapaneseAnalyzer References: <2988D6A5FECC482C8E646C715F5B0847@hpcsayama> <4A4A240F.7070603@informatics.jax.org> <74489DDAAE6F4FD6BD566F1351D6C4BD@hpcsayama> <68821.64439.qm@web24715.mail.ird.yahoo.com> <832998.27012.qm@web24708.mail.ird.yahoo.com> <11667BA67AD1476A90F40C97D1D1599B@hpcsayama> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-PMX-Version: 5.5.4.371499, Antispam-Engine: 2.7.1.369594, Antispam-Data: 2009.7.2.124830 X-PerlMx-Spam: Gauge=IIIIIIII, Probability=8%, Report=' BODY_SIZE_1000_LESS 0, BODY_SIZE_2000_LESS 0, BODY_SIZE_5000_LESS 0, BODY_SIZE_7000_LESS 0, BODY_SIZE_900_999 0, TO_NO_NAME 0, __BOUNCE_CHALLENGE_SUBJ 0, __C230066_P5 0, __CANPHARM_UNSUB_LINK 0, __CT 0, __CTE 0, __CT_TEXT_PLAIN 0, __HAS_MSGID 0, __MIME_TEXT_ONLY 0, __MIME_VERSION 0, __MOZILLA_MSGID 0, __SANE_MSGID 0, __TO_MALFORMED_2 0, __USER_AGENT 0' X-Virus-Checked: Checked by ClamAV on apache.org Out of curiosity, when you try your other test string "aaa _bbb ccc" what do the token byte offsets show? Matt Mark Harwood wrote: > > On 1 Jul 2009, at 17:39, k.sayama wrote: > >> I could verify Token byte offsets >> >> The sytsem outputs >> aaa:0:3 >> bbb:0:3 >> ccc:4:7 >> > > That explains the highlighter behaviour. Clearly BBB is not at > position 0-3 in the String you supplied > >>>> String CONTENTS = "AAA :BBB CCC"; > > Looks like the Tokenizer needs fixing. Is this yours or a standard > Lucene class? If the latter, raising a JIRA bug with a Junit test > would be the best way to get things moving. > > > Cheers > Mark > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > -- Matthew Hall Software Engineer Mouse Genome Informatics mhall@informatics.jax.org (207) 288-6012 --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org