Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 74379 invoked from network); 28 Jan 2011 15:42:17 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 28 Jan 2011 15:42:17 -0000 Received: (qmail 15717 invoked by uid 500); 28 Jan 2011 15:42:15 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 15533 invoked by uid 500); 28 Jan 2011 15:42:12 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 15525 invoked by uid 99); 28 Jan 2011 15:42:12 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 28 Jan 2011 15:42:12 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,RFC_ABUSE_POST,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of karolina.bernat@googlemail.com designates 209.85.216.48 as permitted sender) Received: from [209.85.216.48] (HELO mail-qw0-f48.google.com) (209.85.216.48) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 28 Jan 2011 15:42:07 +0000 Received: by qwe4 with SMTP id 4so3481544qwe.35 for ; Fri, 28 Jan 2011 07:41:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=gamma; h=domainkey-signature:mime-version:from:date:message-id:subject:to :content-type; bh=szSGBRa3OW3uD6vqL8rJ89n1P2Uhak6II49JMJMiANA=; b=bPM64rMWLfA2oGuvAk3INebGwkTyAlfWTwxmF1sNC9cLUDplA5pmjhTNdxGzZvfqFJ ir8qFZIXjKMZ9X0hKG8qyBtjiGQZnOVEHIO8UNo5qobBkEn0Due9zn7QtLn1HsyMm+Ac jBPq0ieHUZu6RUQUuqKXWLIpXlq0u0oj0q4F0= DomainKey-Signature: a=rsa-sha1; c=nofws; d=googlemail.com; s=gamma; h=mime-version:from:date:message-id:subject:to:content-type; b=Q33WkCB6Tkml1qpRkVUDdm+sYeraIknxVDZq7JH0tStY6MMYenJBPK8/5HnGoFpsLW b9LQeCj/yjWz8PPEOtWto0cM8++LFD0FypvHAJV8ZukzJ9G4boo6CnVo82abaVNAN0My TftMwHgNOJXiJOE9DN8FhQd0C5/wN9dh83N2I= Received: by 10.229.91.72 with SMTP id l8mr2786040qcm.137.1296229306337; Fri, 28 Jan 2011 07:41:46 -0800 (PST) MIME-Version: 1.0 Received: by 10.229.82.6 with HTTP; Fri, 28 Jan 2011 07:41:26 -0800 (PST) From: Karolina Bernat Date: Fri, 28 Jan 2011 16:41:26 +0100 Message-ID: Subject: Token position vs. token offset - how to bring them together? To: java-user@lucene.apache.org Content-Type: multipart/alternative; boundary=0016360e3d305a0aef049ae9e7ad --0016360e3d305a0aef049ae9e7ad Content-Type: text/plain; charset=ISO-8859-1 Hello, since I moved on with my offset-info problem in HTML files, I got a new one trying to bring the tokens positions information together with tokens/term offset information. Can someone tell me, how can I get a token, if I know its position? It would be nice to get the tokens position from the Token class, but I could only get the positionIncrement, which is not really helpful.. What I'm actually trying to do, is to find the offset information of a span/phrase query. I know, that the contrib highligter can highlight phrase queries, but I want/need to do it one my own (or rather give the information to another application, that does the highlighting of my documents). I also couldn't really understand, how does the highlighter recognize, that the individual tokens/terms belong to the phrase (i.e. if I search for "peter pan" at the moment I also get the tokens 'peter' and 'pan' as weighted terms, also if they occur individually). Thanks so much in advance! Karolina --0016360e3d305a0aef049ae9e7ad--