lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jack Krupansky" <j...@basetechnology.com>
Subject Re: Getting position increments directly from the the index
Date Thu, 23 May 2013 15:29:08 GMT
If you add a special "end of document term" then some of these calculations 
might be easier.

And, give that special term a payload of the sentence count.

While you're at it, insert "end of sentence" terms that could have a a 
payload of the sentence number.

-- Jack Krupansky
-----Original Message----- 
From: Michael McCandless
Sent: Thursday, May 23, 2013 10:39 AM
To: Lucene Users
Subject: Re: Getting position increments directly from the the index

On Thu, May 23, 2013 at 9:54 AM, Igor Shalyminov
<ishalyminov@yandex-team.ru> wrote:

> But, just to clarify, is there a way to get, let's say, a vector of 
> position increments directly from the index, without re-parsing document 
> contents?

Term vectors (as Jack suggested) are one option, but they are very
heavy (slows down indexing, takes lots of disk space, slow
(seek-per-document) to load at search time).

You can enumerate all positions for each termXdoc in the postings, but
you'd then need to collate by document to get the max position (last
term) for that document.  I guess an int[maxDoc] would do the trick,
then walk that array dividing each maxPosition by 1000.  Or index the
sentence token :)

Mike McCandless

http://blog.mikemccandless.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message