Return-Path: Delivered-To: apmail-lucene-java-dev-archive@www.apache.org Received: (qmail 64244 invoked from network); 5 Nov 2009 13:41:59 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 5 Nov 2009 13:41:59 -0000 Received: (qmail 13671 invoked by uid 500); 5 Nov 2009 13:41:58 -0000 Delivered-To: apmail-lucene-java-dev-archive@lucene.apache.org Received: (qmail 13591 invoked by uid 500); 5 Nov 2009 13:41:58 -0000 Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-dev@lucene.apache.org Delivered-To: mailing list java-dev@lucene.apache.org Received: (qmail 13583 invoked by uid 99); 5 Nov 2009 13:41:58 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 05 Nov 2009 13:41:58 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 05 Nov 2009 13:41:56 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 09836234C04C for ; Thu, 5 Nov 2009 05:41:35 -0800 (PST) Message-ID: <744698589.1257428495037.JavaMail.jira@brutus> Date: Thu, 5 Nov 2009 13:41:35 +0000 (UTC) From: "Christopher Morris (JIRA)" To: java-dev@lucene.apache.org Subject: [jira] Created: (LUCENE-2035) TokenSources.getTokenStream() does not assign positionIncrement MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org TokenSources.getTokenStream() does not assign positionIncrement --------------------------------------------------------------- Key: LUCENE-2035 URL: https://issues.apache.org/jira/browse/LUCENE-2035 Project: Lucene - Java Issue Type: Bug Components: contrib/highlighter Affects Versions: 2.9, 2.4.1, 2.4 Reporter: Christopher Morris TokenSources.StoredTokenStream does not assign positionIncrement information. This means that all tokens in the stream are considered adjacent. This has implications for the phrase highlighting in QueryScorer when using non-contiguous tokens. For example: Consider a token stream that creates tokens for both the stemmed and unstemmed version of each word - the fox (jump|jumped) When retrieved from the index using TokenSources.getTokenStream(tpv,false), the token stream will be - the fox jump jumped Now try a search and highlight for the phrase query "fox jumped". The search will correctly find the document; the highlighter will fail to highlight the phrase because it thinks that there is an additional word between "fox" and "jumped". If we use the original (from the analyzer) token stream then the highlighter works. Also, consider the converse - the fox did not jump "not" is a stop word and there is an option to increment the position to account for stop words - (the,0) (fox,1) (did,2) (jump,4) When retrieved from the index using TokenSources.getTokenStream(tpv,false), the token stream will be - (the,0) (fox,1) (did,2) (jump,3). So the phrase query "did jump" will cause the "did" and "jump" terms in the text "did not jump" to be highlighted. If we use the original (from the analyzer) token stream then the highlighter works correctly. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org For additional commands, e-mail: java-dev-help@lucene.apache.org