Return-Path: X-Original-To: apmail-lucene-java-user-archive@www.apache.org Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 9546E10FCD for ; Wed, 11 Sep 2013 13:25:34 +0000 (UTC) Received: (qmail 11580 invoked by uid 500); 11 Sep 2013 13:25:32 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 11505 invoked by uid 500); 11 Sep 2013 13:25:31 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 11468 invoked by uid 99); 11 Sep 2013 13:25:24 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 11 Sep 2013 13:25:24 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of nischal.srinivas@gmail.com designates 74.125.82.177 as permitted sender) Received: from [74.125.82.177] (HELO mail-we0-f177.google.com) (74.125.82.177) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 11 Sep 2013 13:25:15 +0000 Received: by mail-we0-f177.google.com with SMTP id t60so6853990wes.22 for ; Wed, 11 Sep 2013 06:24:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=aRwwGjJiBK37AF9zvWuZ7Dah9WXM/4FVRnOthpRqjJc=; b=E+syW1rTurwQlYWGyIgoAihz6SlBjFVGNi/AQ4yerLxE44Mt1bV7dqzxVtghHuGqrX 3taNu2BA7iSelH9f3omc5YNl1LbtKnvFISXSCO2pszwU/8Y+ret+a35U0AHt1SceUvBk mmoXgfWIBnp//kokejcfNQ1cZArhRNurYYBAdkAVxSSg3S562CRyDmaMeO7bbgLh+tGx cat4cHTDAxCiMhKn4WefzqIHIwXcTbARGrrRDmOB99pXqqcnqFD1+2ZphPVOtD3M4nyr 9KZKkbB//2Z0RAl0HbTsMJI285V9l17/Kn/qmToUnlZLHKT0U9yauM9v5RN1RjeV86cY /82Q== MIME-Version: 1.0 X-Received: by 10.180.160.212 with SMTP id xm20mr1403694wib.23.1378905895635; Wed, 11 Sep 2013 06:24:55 -0700 (PDT) Received: by 10.194.250.106 with HTTP; Wed, 11 Sep 2013 06:24:55 -0700 (PDT) Date: Wed, 11 Sep 2013 18:54:55 +0530 Message-ID: Subject: Retrieving attributes of terms in lucene From: nischal reddy To: java-user@lucene.apache.org Content-Type: multipart/alternative; boundary=047d7b624d341693a804e61b8cc0 X-Virus-Checked: Checked by ClamAV on apache.org --047d7b624d341693a804e61b8cc0 Content-Type: text/plain; charset=ISO-8859-1 Hi, I have written a custom Tokenizer which will split my input text into tokens, i have overridden the incrementToken method and setting chartermAttribute, offsetAttribute, typeAttribute (Please find the method below..) @Override final public boolean incrementToken() throws IOException { clearAttributes(); if(reader == null){ reader = input; initProgressLexer(); } TokenType myObj = null; if((myObj = next()) != null){ charTermAttribute.append(myObj.tokenText); offsetAttribute.setOffset(myObj.startOffset, myObj.endOffset); typeAttribute.setType(myObj.type); payloadAttribute.setPayload(new BytesRef(myObj.type.getBytes())); return true; }else{ return false; } } now when i search for a text in my index i want to retrieve the type, offset and charTermAttribute of the matched tokens. to achieve this i am using the matched documents to retrieve the DocsAndPositionsEnum object and then calling the method startOffset() and endOffset() to retrieve the offsets and getPayload() to get the payload. but i am not able to retrieve the type and charTermAttribute values of the matched terms. Below is the method where i am doing all the stuff to retrieve the offsets. private void showHits(TermQuery query, TopDocs hits) throws CorruptIndexException, IOException { ProgressSearchEngine .debug("Found " + hits.totalHits + " document(s) that matched query '" + query.toString() + "':"); for (ScoreDoc scoreDoc : hits.scoreDocs) { //Get the document Document doc = iSearcher.doc(scoreDoc.doc); ProgressSearchEngine.debug("File Name:: " + doc.get(FIELD_FILE_PATH)); //Get the terms of that document Terms termsVector = iReader.getTermVector(scoreDoc.doc, query .getTerm().field()); if (termsVector != null) { TermsEnum termsEnum = null; termsEnum = termsVector.iterator(termsEnum); //seek to the exact position of the matched term if (termsEnum.seekExact(new BytesRef(query.getTerm().text()), false)) { DocsAndPositionsEnum dpEnum = null; dpEnum = termsEnum.docsAndPositions(null, dpEnum); if (dpEnum != null) { if (dpEnum.nextDoc() == 0) { // you need to call nextDoc() to have the enum positioned int freq = dpEnum.freq(); for(int i=0;i < freq; ++i){ int position = dpEnum.nextPosition(); if(position != -1){ String filePath = doc.get(FIELD_FILE_PATH); System.out.println("file path "+filePath); System.out.println("Start offset " + dpEnum.startOffset() + " End offset " + dpEnum.endOffset()); } } }else{ ProgressSearchEngine.debug( "Not able to find the offsets for the file: "+ doc.get(FIELD_FILE_PATH)); } } } } } } Can someone please help me how to get all the attributes that we set in the incrementToken method. And can we add our own attribute apart from already available ones? if yes how? TIA, Nischal Y --047d7b624d341693a804e61b8cc0--