lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Piotr Pęzik <piotr.pe...@gmail.com>
Subject TermVectors and Attributes in Lucene 4.0
Date Mon, 17 Dec 2012 12:15:43 GMT
Hi,

I've been trying to enumerate over all terms in all documents in a 
Lucene 4.0 index  in order to retrieve their attributes (payloads, 
positions etc.).

I have an index with documents containing stored, tokenized fields with 
term vectors, offsets and payloads.  Below is what I have tried so far 
(have to admit I don't fully understand this part of the 4.0 API yet).

My questions are: can I use either TermsEnum or DocsEnum or 
DocsAndPositionsEnum to access each term per each document and get its 
attributes? They all have the .attributes() method, but so far I haven't 
managed to make it return the actual attributes of individual terms (not 
even the CharTermAttribute).


Thanks,

Piotr Pezik


//Checking field type:

Document doc = dReader.document(1);
System.out.println(doc.getField("myField").fieldType());
//=> 
stored,indexed,tokenized,termVector,indexOptions=DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS

//Getting Terms and TermsEnum:

Terms terms = SlowCompositeReaderWrapper
                 .wrap(directoryReader).terms("myField");
TermsEnum tenum = terms.iterator(TermsEnum.EMPTY);

//Moving to the next term (?)

BytesRef br = tenum.next();

System.out.println(tenum.attributes().hasAttributes());

//=>FALSE

System.out.println(tenum.attributes().getAttribute(PositionIncrementAttribute.class)); 


// => java.lang.IllegalArgumentException: This AttributeSource does not 
have the attribute 
'org.apache.lucene.analysis.tokenattributes.PositionIncrementAttribute'.

Bits liveDocs = SlowCompositeReaderWrapper.wrap(dReader).getLiveDocs();


DocsEnum denum  = tenum.docs(liveDocs, null);
denum.nextDoc();
System.out.println(denum.attributes().hasAttributes());

//=>FALSE

DocsAndPositionsEnum denum2  = tenum.docsAndPositions(liveDocs, null);
denum2.nextDoc();
System.out.println(denum2.attributes().hasAttributes());

//=>FALSE




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message