Return-Path: X-Original-To: apmail-jackrabbit-dev-archive@www.apache.org Delivered-To: apmail-jackrabbit-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id EB3E2972E for ; Wed, 21 Sep 2011 14:15:33 +0000 (UTC) Received: (qmail 52153 invoked by uid 500); 21 Sep 2011 14:15:33 -0000 Delivered-To: apmail-jackrabbit-dev-archive@jackrabbit.apache.org Received: (qmail 52115 invoked by uid 500); 21 Sep 2011 14:15:33 -0000 Mailing-List: contact dev-help@jackrabbit.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@jackrabbit.apache.org Delivered-To: mailing list dev@jackrabbit.apache.org Received: (qmail 52108 invoked by uid 99); 21 Sep 2011 14:15:33 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 21 Sep 2011 14:15:33 +0000 X-ASF-Spam-Status: No, hits=-2000.5 required=5.0 tests=ALL_TRUSTED,RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 21 Sep 2011 14:15:30 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id 3903CA6DEB for ; Wed, 21 Sep 2011 14:15:09 +0000 (UTC) Date: Wed, 21 Sep 2011 14:15:09 +0000 (UTC) From: "Julian Reschke (JIRA)" To: dev@jackrabbit.apache.org Message-ID: <1380317874.50707.1316614509230.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <1539909991.34723.1316182929115.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (JCR-3075) incorrect HTML excerpt generation for queries on japanese text content MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/JCR-3075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13109519#comment-13109519 ] Julian Reschke commented on JCR-3075: ------------------------------------- ExcerptTest#testEncodeIllegalCharsNoHighlights fails because it switched from testing excerpt(text) to excerpt(.). Other than that the new tests look good. I know next to nothing about Lucene, but if the code fixes the new tests while not breaking anything old, then this should probably go in. Thanks for taking over! > incorrect HTML excerpt generation for queries on japanese text content > ----------------------------------------------------------------------- > > Key: JCR-3075 > URL: https://issues.apache.org/jira/browse/JCR-3075 > Project: Jackrabbit Content Repository > Issue Type: Bug > Components: jackrabbit-core > Reporter: Julian Reschke > Priority: Minor > Attachments: JCR-3075.patch > > > The generated excerpt highlights single characters instead of full words. Test case (to be added to FullTextQueryTest): > public void testJapaneseAndHighlight() throws RepositoryException { > // http://translate.google.com/#auto|en|%E3%82%B3%E3%83%B3%E3%83%86%E3%83%B3%E3%83%88 > String jContent = "\u30b3\u30fe\u30c6\u30f3\u30c8"; > // http://translate.google.com/#auto|en|%E3%83%86%E3%82%B9%E3%83%88 > String jTest = "\u30c6\u30b9\u30c8"; > > String content = "some text with japanese: " + jContent > + " ('content')" + " and " + jTest + " ('test')."; > // expected excerpt; note this may change if excerpt providers change > String expectedExcerpt = "
some text with japanese: " + jContent > + " ('content') and " + jTest > + " ('test').
"; > > Node n = testRootNode.addNode("node1"); > n.setProperty("title", content); > testRootNode.getSession().save(); > > String xpath = "/jcr:root" + testRoot + "/element(*, nt:unstructured)" > + "[jcr:contains(., '" + jTest + "')]/rep:excerpt(.)"; > Query q = superuser.getWorkspace().getQueryManager() > .createQuery(xpath, Query.XPATH); > > QueryResult qr = q.execute(); > RowIterator it = qr.getRows(); > int cnt = 0; > while (it.hasNext()) { > cnt++; > Row found = it.nextRow(); > assertEquals(n.getPath(), found.getPath()); > String excerpt = found.getValue("rep:excerpt(.)").getString(); > assertEquals(expectedExcerpt, excerpt); > } > > assertEquals(1, cnt); > } -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira