Return-Path: X-Original-To: apmail-lucene-dev-archive@www.apache.org Delivered-To: apmail-lucene-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 1F919486A for ; Thu, 23 Jun 2011 22:07:11 +0000 (UTC) Received: (qmail 9655 invoked by uid 500); 23 Jun 2011 22:07:09 -0000 Delivered-To: apmail-lucene-dev-archive@lucene.apache.org Received: (qmail 9570 invoked by uid 500); 23 Jun 2011 22:07:09 -0000 Mailing-List: contact dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@lucene.apache.org Delivered-To: mailing list dev@lucene.apache.org Received: (qmail 9465 invoked by uid 99); 23 Jun 2011 22:07:09 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 23 Jun 2011 22:07:09 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 23 Jun 2011 22:07:07 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id CAEDB42B8A6 for ; Thu, 23 Jun 2011 22:06:47 +0000 (UTC) Date: Thu, 23 Jun 2011 22:06:47 +0000 (UTC) From: "Robert Muir (JIRA)" To: dev@lucene.apache.org Message-ID: <982514349.34770.1308866807827.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <1119584066.34030.1308850908325.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (LUCENE-3234) Provide limit on phrase analysis in FastVectorHighlighter MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/LUCENE-3234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13054133#comment-13054133 ] Robert Muir commented on LUCENE-3234: ------------------------------------- oh thats ok, i just meant a little tiny benchmark, hitting the nasty case that we might think might be n^2. If the little test case does that... then that will work, just wasn't sure if it did. either way just something to look at in the profiler, etc. > Provide limit on phrase analysis in FastVectorHighlighter > --------------------------------------------------------- > > Key: LUCENE-3234 > URL: https://issues.apache.org/jira/browse/LUCENE-3234 > Project: Lucene - Java > Issue Type: Improvement > Reporter: Mike Sokolov > Attachments: LUCENE-3234.patch > > > With larger documents, FVH can spend a lot of time trying to find the best-scoring snippet as it examines every possible phrase formed from matching terms in the document. If one is willing to accept > less-than-perfect scoring by limiting the number of phrases that are examined, substantial speedups are possible. This is analogous to the Highlighter limit on the number of characters to analyze. > The patch includes an artifical test case that shows > 1000x speedup. In a more normal test environment, with English documents and random queries, I am seeing speedups of around 3-10x when setting phraseLimit=1, which has the effect of selecting the first possible snippet in the document. Most of our sites operate in this way (just show the first snippet), so this would be a big win for us. > With phraseLimit = -1, you get the existing FVH behavior. At larger values of phraseLimit, you may not get substantial speedup in the normal case, but you do get the benefit of protection against blow-up in pathological cases. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional commands, e-mail: dev-help@lucene.apache.org