Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: java-user@lucene.apache.org
Received-SPF: pass (nike.apache.org: domain of nischal.srinivas@gmail.com
 designates 74.125.82.177 as permitted sender)
MIME-Version: 1.0
Date: Wed, 11 Sep 2013 18:54:55 +0530
Message-ID: 
 <CA+40OKJ88ki9jS6HfeMF-Y23jjUM+FYq+6sadb2jvxeHdHf62A@mail.gmail.com>
Subject: Retrieving attributes of terms in lucene
From: nischal reddy <nischal.srinivas@gmail.com>
To: java-user@lucene.apache.org
Content-Type: multipart/alternative; boundary=047d7b624d341693a804e61b8cc0

--047d7b624d341693a804e61b8cc0
Content-Type: text/plain; charset=ISO-8859-1

Hi,

I have written a custom Tokenizer which will split my input text into
tokens, i have overridden the incrementToken method and setting
chartermAttribute, offsetAttribute, typeAttribute (Please find the method
below..)

@Override
    final public boolean incrementToken() throws IOException {
        clearAttributes();
        if(reader == null){
            reader = input;
            initProgressLexer();
        }
        TokenType myObj = null;
        if((myObj = next()) != null){
            charTermAttribute.append(myObj.tokenText);
            offsetAttribute.setOffset(myObj.startOffset, myObj.endOffset);
            typeAttribute.setType(myObj.type);
            payloadAttribute.setPayload(new
BytesRef(myObj.type.getBytes()));
            return true;
        }else{
            return false;
        }
    }

now when i search for a text in my index i want to retrieve the type,
offset and charTermAttribute of the matched tokens.

to achieve this i am using the matched documents to retrieve the
DocsAndPositionsEnum object and then calling the method startOffset() and
endOffset() to retrieve the offsets and getPayload() to get the payload.
but i am not able to retrieve the type and charTermAttribute values of the
matched terms. Below is the method where i am doing all the stuff to
retrieve the offsets.

private void showHits(TermQuery query, TopDocs hits)
            throws CorruptIndexException, IOException {
        ProgressSearchEngine
                .debug("Found " + hits.totalHits
                        + " document(s) that matched query '"
                        + query.toString() + "':");
        for (ScoreDoc scoreDoc : hits.scoreDocs) {
            //Get the document
            Document doc = iSearcher.doc(scoreDoc.doc);
            ProgressSearchEngine.debug("File Name:: "
                    + doc.get(FIELD_FILE_PATH));
            //Get the terms of that document
            Terms termsVector = iReader.getTermVector(scoreDoc.doc, query
                    .getTerm().field());

            if (termsVector != null) {
                TermsEnum termsEnum = null;
                termsEnum = termsVector.iterator(termsEnum);
                //seek to the exact position of the matched term
                if (termsEnum.seekExact(new
BytesRef(query.getTerm().text()),
                        false)) {

                    DocsAndPositionsEnum dpEnum = null;
                    dpEnum = termsEnum.docsAndPositions(null, dpEnum);

                    if (dpEnum != null) {

                         if (dpEnum.nextDoc() == 0) { // you need to call
nextDoc() to have the enum positioned

                             int freq = dpEnum.freq();


                                for(int i=0;i < freq; ++i){
                                    int position = dpEnum.nextPosition();
                                    if(position != -1){
                                        String filePath =
doc.get(FIELD_FILE_PATH);
                                        System.out.println("file path
"+filePath);
                                        System.out.println("Start offset "
                                                + dpEnum.startOffset() + "
End offset "
                                                + dpEnum.endOffset());

                                    }
                                }

                         }else{

                             ProgressSearchEngine.debug(
                                        "Not able to find the offsets for
the file: "+ doc.get(FIELD_FILE_PATH));

                         }


                    }
                }
            }

        }
    }

Can someone please help me how to get all the attributes that we set in the
incrementToken method.

And can we add our own attribute apart from already available ones? if yes
how?

TIA,
Nischal Y

--047d7b624d341693a804e61b8cc0--