Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 64162 invoked from network); 11 Dec 2008 15:09:30 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 11 Dec 2008 15:09:30 -0000 Received: (qmail 92546 invoked by uid 500); 11 Dec 2008 15:09:36 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 92523 invoked by uid 500); 11 Dec 2008 15:09:36 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 92512 invoked by uid 99); 11 Dec 2008 15:09:36 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 11 Dec 2008 07:09:36 -0800 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of koji@r.email.ne.jp designates 202.224.39.198 as permitted sender) Received: from [202.224.39.198] (HELO mail.asahi-net.or.jp) (202.224.39.198) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 11 Dec 2008 15:09:21 +0000 Received: from koji-sekiguchi-no-macbook.local (w244069.ppp.asahi-net.or.jp [121.1.244.69]) by mail.asahi-net.or.jp (Postfix) with ESMTP id 2C4196AE74 for ; Fri, 12 Dec 2008 00:08:59 +0900 (JST) Message-ID: <49412D0A.60500@r.email.ne.jp> Date: Fri, 12 Dec 2008 00:08:58 +0900 From: Koji Sekiguchi User-Agent: Thunderbird 2.0.0.18 (Macintosh/20081105) MIME-Version: 1.0 To: java-user@lucene.apache.org Subject: problem on NGram+Highlighter Content-Type: text/plain; charset=ISO-2022-JP Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org Hello, I have a problem when using n-gram and highlighter. I thought it had been solved on the ticket: http://issues.apache.org/jira/browse/LUCENE-627 Actually, I found this problem when I was using CJKTokenizer on Solr, though, here is lucene program to reproduce it using NGramTokenizer(min=2,max=2) instead of CJKTokenizer: public class TestNGramHighlighter { public static void main(String[] args) throws Exception { Analyzer analyzer = new NGramAnalyzer(); final String TEXT = "ABCDEFGHIJKLMNABCDEFGHIJKLMN"; final String QUERY = "GHI"; QueryParser parser = new QueryParser("f",analyzer); Query query = parser.parse(QUERY); QueryScorer scorer = new QueryScorer(query,"f"); Highlighter h = new Highlighter( scorer ); System.out.println( h.getBestFragment(analyzer, "f", TEXT) ); } static class NGramAnalyzer extends Analyzer { public TokenStream tokenStream(String field, Reader input) { return new NGramTokenizer(input,2,2); } } } expected output is: ABCDEFGHIJKLMNABCDEFGHIJKLMN but the actual output is: ABCDEFGHIJKLMNABCDEFGHIJKLMN Am I missing something? Thank you, Koji --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org