lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: Getting position increments directly from the the index
Date Thu, 23 May 2013 14:39:00 GMT
On Thu, May 23, 2013 at 9:54 AM, Igor Shalyminov
<ishalyminov@yandex-team.ru> wrote:

> But, just to clarify, is there a way to get, let's say, a vector of position increments
directly from the index, without re-parsing document contents?

Term vectors (as Jack suggested) are one option, but they are very
heavy (slows down indexing, takes lots of disk space, slow
(seek-per-document) to load at search time).

You can enumerate all positions for each termXdoc in the postings, but
you'd then need to collate by document to get the max position (last
term) for that document.  I guess an int[maxDoc] would do the trick,
then walk that array dividing each maxPosition by 1000.  Or index the
sentence token :)

Mike McCandless

http://blog.mikemccandless.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message