lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Muir <rcm...@gmail.com>
Subject Re: Problem with TermVector offsets and positions not being preserved
Date Sat, 21 Jul 2012 00:59:29 GMT
On Fri, Jul 20, 2012 at 8:24 PM, Mike O'Leary <tmoleary@uw.edu> wrote:
> Hi Robert,
> I'm not trying to determine whether a document has term vectors, I'm trying to determine
whether the term vectors that are in the index have offsets and positions > stored.

Right: what i'm trying to tell you is that offsets and positions is
not an index-wide setting for a field: its per-document.

I think all the tools you are using to check these values are not
doing it correctly:
1. DumpIndex is wrongly using values from the Document returned by
IndexReader.document(), but that doesn't and never did retrieve these
values (it would be 2 extra disk seeks per document to figure out the
term vector flags)
2. I havent looked at Luke, but its probably printing the "global"
bits from FieldInfos. It used to be that we wrote some bits for these
options, I don't ever know what the purpose was since these options
can be controlled on/off at a per-document level: they make no sense.
Because of this we stopped writing these bits in 3.6 (we only write
into FieldInfos if the field has any term vectors at all), and thats
probably whats confusing you there.

Again, if you really want to validate that a specific document has
offsets/positions in its term vectors, you need to check that specific
document with IndexReader.getTermFreqVector, there is no other way,
since this can be controlled on a per-document basis for a field.


-- 
lucidimagination.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message