Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 90930 invoked from network); 21 Sep 2010 16:30:22 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 21 Sep 2010 16:30:22 -0000 Received: (qmail 95889 invoked by uid 500); 21 Sep 2010 16:30:20 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 95834 invoked by uid 500); 21 Sep 2010 16:30:19 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 95825 invoked by uid 99); 21 Sep 2010 16:30:19 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 21 Sep 2010 16:30:19 +0000 X-ASF-Spam-Status: No, hits=0.7 required=10.0 tests=SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [85.25.71.29] (HELO mail.troja.net) (85.25.71.29) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 21 Sep 2010 16:30:12 +0000 Received: from localhost (localhost.localdomain [127.0.0.1]) by mail.troja.net (Postfix) with ESMTP id A92F745E8EF for ; Tue, 21 Sep 2010 18:29:51 +0200 (CEST) X-Virus-Scanned: Debian amavisd-new at mail.troja.net Received: from mail.troja.net ([127.0.0.1]) by localhost (megaira.troja.net [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id UtOo3myRS3Do for ; Tue, 21 Sep 2010 18:29:38 +0200 (CEST) Received: from VEGA (unknown [207.179.4.183]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) by mail.troja.net (Postfix) with ESMTPSA id 1A2F545E8DA for ; Tue, 21 Sep 2010 18:29:37 +0200 (CEST) From: "Uwe Schindler" To: References: In-Reply-To: Subject: RE: How to export lucene index to a simple text file? Date: Tue, 21 Sep 2010 09:29:55 -0700 Message-ID: <000501cb59aa$3b82eea0$b288cbe0$@thetaphi.de> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Mailer: Microsoft Outlook 14.0 Thread-Index: AQFrYwn5Z3jqKEi3hpDJsLR1USalB5Pcjb7A Content-Language: de X-Virus-Checked: Checked by ClamAV on apache.org Hi, Retrieve a TermEnum and iterate it. By that you get all terms and can retrieve the docFreq, which is the second column in your table. Finally for each term you position the TermDocs enum on this term to get all document ids. Read docs of IndexReader/TermEnum/TermDocs about this. Uwe ----- Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: uwe@thetaphi.de > -----Original Message----- > From: Sahin Buyrukbilen [mailto:sahin.buyrukbilen@gmail.com] > Sent: Tuesday, September 21, 2010 9:12 AM > To: java-user@lucene.apache.org > Subject: How to export lucene index to a simple text file? > > Hi, > > I am currently working on a project about private information retrieval and I > need to have an inverted index file in txt format as follows: > > Term t freq t Inverted list for t > ------------------------------------------------------------------------- > and 1 <6, 0.159> > big 2 <2, 0.148> <3, 0.088> > dark 1 <6, 0.079> > . > . > . > . > > here the pairs are indicating: number1: doc ID, where > term t exist with a rank of number2. > > I have created an index from 5492 txt files, however the index is composed of > different files and most of the data is not in the text format. > > could somebody guide me to achieve this? > > Thank you > > Sahin. --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org