lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jon Stewart <...@lightboxtechnologies.com>
Subject Document term vectors in Lucene 4
Date Thu, 17 Jan 2013 05:52:09 GMT
Hello,

I cannot extract document term vectors from an index, and have not
turned up much in some determined googling. In short, when I call
IndexReader.getTermVector(docID, field) or
IndexReader.getTermVectors(docID) and then navigate down to the Terms
for the specified field, I get a null result.

// Indexing:
  String bodyText = "this is foobar";
  final FieldType BodyOptions = new FieldType();
  BodyOptions.setIndexed(true);
  BodyOptions.setIndexOptions(IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS);
  BodyOptions.setStored(true);
  BodyOptions.setStoreTermVectors(true);
  BodyOptions.setTokenized(true);
  Document doc = new Document();
  doc.add(new Field("body", bodyText, BodyOptions));

When I examine docs in Luke, I can see the term vectors.

// Retrieving (at a later time)
  DirectoryReader dirRdr = DirectoryReader.open(FSDirectory.open(new
File(path)));
  SlowCompositeReaderWrapper rdr = new SlowCompositeReaderWrapper(dirRdr);
  for (int i = 0; i < rdr.maxDoc(); ++i) {
    int numTerms = 0;
    Terms terms = rdr.getTermVector(i, "body");
    if (terms != null) {
      TermsEnum term = terms.iterator(null);
      while (term.next() != null) {
        ++numTerms;
      }
      System.out.println("doc " + i + " had " + numTerms + " terms");
    }
    else {
      System.err.println("null term vector on doc " + i);
    }
  }

On every doc, the Terms object I get back from getTermVector(i, "body") is null.


Jon
--
Jon Stewart, Principal
(646) 719-0317 | jon@lightboxtechnologies.com | Arlington, VA

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message