lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: hit position
Date Tue, 07 Sep 2010 11:34:42 GMT
Line number is a completely unknown concept to Lucene, you have to somehow
figure it out. I've seen at least two ways to make that work:
1> use payloads. A payload is just a bit of data you attach to each token,
what you
     put in there is up to you, so you can encode this kind of information
however you
     want. See:
http://www.lucidimagination.com/blog/2009/08/05/getting-started-with-payloads/

2> You can do a similar sort of thing by recording the relevant information
when
     you analyze a document and the include that data in a very special
(probably
     stored-only field) in your document. Say the offsets of each beginning
of line.
     This field would never be searched, just used to find out what line a
hit
     was on. Then, when you can find the lines numbers once you know the
term
     positions. Stealing from Grant:
See
http://www.lucidimagination.com/search/document/7fe40486bc935ce4/get_term_neighbours(although
I think you can do better than the code in the third reply by using a
TermVectorMapper such that you can process the TermVector as it comes from
disk.)

Essentially, you need to use a combination of SpanQuery, TermVector and
TermVectorMapper.

HTH
Erick


On Mon, Sep 6, 2010 at 10:36 PM, Lev Bronshtein
<lev_bronshtein@hotmail.com>wrote:

>
> Now that I can index my data, I want to be able to search it and report
> some sort of position information with every hit, such as a line number or a
> byte ofset within the stream.  Any idea how I can acoomplish this?
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message