you should call currDocsAndPositions.nextPosition() before you call
currDocsAndPositions.getPayload() payloads are per positions so you
need to advance the pos first!
simon
On Mon, Oct 29, 2012 at 6:44 PM, Ivan Vasilev <ivasilev@sirma.bg> wrote:
> Hi Guys,
>
> I use the following code to index documents and set Payloads to term
> positions:
>
> public class TestPayloads_ {
> private static final String INDEX_DIR =
> "E:/Temp/Index";
>
> public static void main(String[] args) throws Exception {
> IndexWriterConfig iwc = new IndexWriterConfig(Version.LUCENE_40, new
> MyAnalyzer_());
> iwc.setOpenMode(OpenMode.CREATE);
> IndexWriter writer = new IndexWriter(FSDirectory.open(new
> File(INDEX_DIR)), iwc);
>
> FieldType fieldType = new FieldType();
> IndexOptions indexOptions =
> IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS;
> fieldType.setIndexOptions(indexOptions);
> fieldType.setIndexed(true);
> fieldType.setOmitNorms(true);
> fieldType.setStored(true);
> fieldType.freeze();
>
> Document doc = new Document();
> doc.add(new Field("content", "one two three four.", fieldType));
> writer.addDocument(doc);
>
> writer.addDocument(doc);
> writer.addDocument(doc);
>
> writer.close();
>
> DirectoryReader dr = DirectoryReader.open(FSDirectory.open(new
> File(INDEX_DIR)));
> AtomicReader sr = dr.leaves().get(0).reader();
>
> Bits liveDocs = sr.getLiveDocs();
> Fields fields = sr.fields();
> for (String currFieldName : fields) {
> Terms currTerms = fields.terms(currFieldName);
> TermsEnum currTermEnum = currTerms.iterator(null);
> boolean currTermsHasPayloads = currTerms.hasPayloads();
> BytesRef currFieldValue;
> while ((currFieldValue = currTermEnum.next()) != null) {
> String currVfieldValueStr = currFieldValue.utf8ToString();
> // DocsEnum currDocsEnum = currTermEnum.docs(liveDocs,
> null);
> DocsAndPositionsEnum currDocsAndPositions =
> currTermEnum.docsAndPositions(liveDocs, null,
> DocsAndPositionsEnum.FLAG_PAYLOADS
> | DocsAndPositionsEnum.FLAG_OFFSETS);
> int docID;
> while ((docID = currDocsAndPositions.nextDoc()) !=
> DocsEnum.NO_MORE_DOCS) {
> int freq = currDocsAndPositions.freq();
> for (int i = 0; i < freq; i++) {
> byte payload;
> if (currTermsHasPayloads &&
> currDocsAndPositions.getPayload() != null) {
> payload =
> currDocsAndPositions.getPayload().bytes[0];
> } else {
> payload = -1;
> }
> System.out.println("Term: (" + currFieldName + ":" +
> currVfieldValueStr + "); doc: "
> + docID + "; position: " +
> currDocsAndPositions.nextPosition()
> + "; payload: " + payload);
> }
> }
> }
> }
> dr.close();
> }
>
> }
>
> class MyAnalyzer_ extends Analyzer {
> @Override
> protected TokenStreamComponents createComponents(String fieldName,
> Reader reader) {
> Tokenizer tokenizer = new StandardTokenizer(Version.LUCENE_40,
> reader);
> return new TokenStreamComponents(tokenizer, new
> MyFilter_(tokenizer));
> }
>
> }
>
> class MyFilter_ extends TokenFilter {
> private PayloadAttribute payloadAttr;
> private byte[] payloadVal;
>
> MyFilter_(TokenStream in) {
> super(in);
> payloadAttr = addAttribute(PayloadAttribute.class);
> payloadVal = new byte[1];
> }
> public final boolean incrementToken() throws IOException {
> if (input.incrementToken()) {
> payloadVal[0]++;
> payloadAttr.setPayload(new BytesRef(payloadVal));
> return true;
> } else {
> return false;
> }
> }
> }
>
>
>
>
> The output is the following:
>
> Term: (content:four); doc: 0; position: 3; payload: -1
> Term: (content:four); doc: 1; position: 3; payload: 4
> Term: (content:four); doc: 2; position: 3; payload: 8
> Term: (content:one); doc: 0; position: 0; payload: -1
> Term: (content:one); doc: 1; position: 0; payload: 1
> Term: (content:one); doc: 2; position: 0; payload: 5
> Term: (content:three); doc: 0; position: 2; payload: -1
> Term: (content:three); doc: 1; position: 2; payload: 3
> Term: (content:three); doc: 2; position: 2; payload: 7
> Term: (content:two); doc: 0; position: 1; payload: -1
> Term: (content:two); doc: 1; position: 1; payload: 2
> Term: (content:two); doc: 2; position: 1; payload: 6
>
>
> The payloads of document with Lucene ID #0 were not added. Payloads that
> were intended to doc #0 were added to doc #1, those intended for doc #1 were
> added to doc #2.
> With the debugger I see that during adding doc #0 payloadVal is incremented
> form 1 to 4, and after each incrementation is invoked
> payloadAttr.setPayload(..), but strangely when reading
> DocsAndPositionsEnumwe see those payloads (1 to 4) belong actually to doc
> #1.
>
> Do I make some mistake with invoking setPayload(..) method or it is a bug?
>
> Cheers,
> Ivan Vasilev
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
|