lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mike O'Leary" <tmole...@uw.edu>
Subject Problem with TermVector offsets and positions not being preserved
Date Thu, 19 Jul 2012 23:16:52 GMT
I created an index using Lucene 3.6.0 in which I specified that a certain text field in each
document should be indexed, stored, analyzed with no norms, with term vectors, offsets and
positions. Later I looked at that index in Luke, and it said that term vectors were created
for this field, but offsets and positions were not. The code I used for indexing couldn't
be simpler. It looks like this for the relevant field:

doc.add(new Field("ReportText", reportTextContents, Field.Store.YES, Field.Index.ANALYZED_NO_NORMS,
Field.TermVector.WITH_POSITIONS_OFFSETS);

The indexer adds these documents to the index and commits them. I ran the indexer in a debugger
and watched the Lucene code set the Field instance variables called storeTermVector, storeOffsetWithTermVector
and storePositionWithTermVector to true for this field.

When the indexing was done, I ran a simple program in a debugger that opens an index, reads
each document and writes out its information as XML. The values of storeOffsetWithTermVector
and storePositionWithTermVector in the ReportText Field objects were false. Is there something
other than specifying Field.TermVector.WITH_POSITIONS_OFFSETS when constructing a Field that
needs to be done in order for offsets and positions to be saved in the index? Or are there
circumstances under which the Field.TermVector setting for a Field object is ignored? This
doesn't make sense to me, and I could swear that offsets and positions were being saved in
some older indexes I created that I unfortunately no longer have around for comparison. I'm
sure that I am just overlooking something or have made some kind of mistake, but I can't see
what it is at the moment. Thanks for any help or advice you can give me.
Mike

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message