lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrzej Bialecki ...@getopt.org>
Subject Re: Problem with TermVector offsets and positions not being preserved
Date Fri, 27 Jul 2012 13:10:46 GMT
On 27/07/2012 00:50, Mike O'Leary wrote:
> Hi Robert,
> Thanks for your help. This cleared up all of the things I was having trouble understanding
about offsets and positions in term vectors.
> Mike
>
> -----Original Message-----
> From: Robert Muir [mailto:rcmuir@gmail.com]
> Sent: Friday, July 20, 2012 5:59 PM
> To: java-user@lucene.apache.org
> Subject: Re: Problem with TermVector offsets and positions not being preserved
>
> On Fri, Jul 20, 2012 at 8:24 PM, Mike O'Leary <tmoleary@uw.edu> wrote:
>> Hi Robert,
>> I'm not trying to determine whether a document has term vectors, I'm trying to determine
whether the term vectors that are in the index have offsets and positions > stored.
>
> Right: what i'm trying to tell you is that offsets and positions is not an index-wide
setting for a field: its per-document.
>
> I think all the tools you are using to check these values are not doing it correctly:
> 1. DumpIndex is wrongly using values from the Document returned by IndexReader.document(),
but that doesn't and never did retrieve these values (it would be 2 extra disk seeks per document
to figure out the term vector flags) 2. I havent looked at Luke, but its probably printing
the "global"
> bits from FieldInfos. It used to be that we wrote some bits for these options, I don't
ever know what the purpose was since these options can be controlled on/off at a per-document
level: they make no sense.
> Because of this we stopped writing these bits in 3.6 (we only write into FieldInfos if
the field has any term vectors at all), and thats probably whats confusing you there.

Catching up with this thread ... Luke 4.0-ALPHA makes a similar mistake. 
I fixed this in svn (to be released in a week or so) so that:

* Luke now actually checks whether a doc has term vectors for a 
particular field and adjusts the field flags based on the 
presence/absence of a term vector. FieldInfos were not enough to handle 
some combinations.

* Luke doesn't show the offsets/positions flags in the document view, 
since they are not known in advance. However, the pop-up that shows a 
term vector correctly shows positions and offsets if available (or 
blanks if not available).


-- 
Best regards,
Andrzej Bialecki
http://www.sigram.com, blog http://www.sigram.com/blog
  ___.,___,___,___,_._. __________________<><____________________
[___||.__|__/|__||\/|: Information Retrieval, System Integration
___|||__||..\|..||..|: Contact: info at sigram dot com


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message