Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: java-user@lucene.apache.org
Received-SPF: pass (athena.apache.org: domain of
 karolina.bernat@googlemail.com designates 209.85.216.48 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=googlemail.com; s=gamma;
        h=mime-version:from:date:message-id:subject:to:content-type;
        b=Q33WkCB6Tkml1qpRkVUDdm+sYeraIknxVDZq7JH0tStY6MMYenJBPK8/5HnGoFpsLW
         b9LQeCj/yjWz8PPEOtWto0cM8++LFD0FypvHAJV8ZukzJ9G4boo6CnVo82abaVNAN0My
         TftMwHgNOJXiJOE9DN8FhQd0C5/wN9dh83N2I=
MIME-Version: 1.0
From: Karolina Bernat <karolina.bernat@googlemail.com>
Date: Fri, 28 Jan 2011 16:41:26 +0100
Message-ID: <AANLkTin=c_qLGW_2yZGta9DPi1OOACcA74yCr=URfXEs@mail.gmail.com>
Subject: Token position vs. token offset - how to bring them together?
To: java-user@lucene.apache.org
Content-Type: multipart/alternative; boundary=0016360e3d305a0aef049ae9e7ad

--0016360e3d305a0aef049ae9e7ad
Content-Type: text/plain; charset=ISO-8859-1

Hello,

since I moved on with my offset-info problem in HTML files, I got a new one
trying to bring the tokens positions information together with tokens/term
offset information. Can someone tell me, how can I get a token, if I know
its position? It would be nice to get the tokens position from the Token
class, but I could only get the positionIncrement, which is not really
helpful..

What I'm actually trying to do, is to find the offset information of a
span/phrase query. I know, that the contrib highligter can highlight phrase
queries, but I want/need to do it one my own (or rather give the information
to another application, that does the highlighting of my documents). I also
couldn't really understand, how does the highlighter recognize, that the
individual tokens/terms belong to the phrase (i.e. if I search for "peter
pan" at the moment I also get the tokens 'peter' and 'pan' as weighted
terms, also if they occur individually).

Thanks so much in advance!
Karolina

--0016360e3d305a0aef049ae9e7ad--