Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 79026 invoked from network); 22 Sep 2010 09:41:14 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 22 Sep 2010 09:41:14 -0000 Received: (qmail 52724 invoked by uid 500); 22 Sep 2010 09:41:13 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 52359 invoked by uid 500); 22 Sep 2010 09:41:10 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 52351 invoked by uid 99); 22 Sep 2010 09:41:09 -0000 Received: from Unknown (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 22 Sep 2010 09:41:09 +0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of adrianocrestani@gmail.com designates 209.85.214.176 as permitted sender) Received: from [209.85.214.176] (HELO mail-iw0-f176.google.com) (209.85.214.176) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 22 Sep 2010 09:40:48 +0000 Received: by iwn9 with SMTP id 9so493824iwn.35 for ; Wed, 22 Sep 2010 02:40:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:mime-version:received:in-reply-to :references:from:date:message-id:subject:to:content-type :content-transfer-encoding; bh=kmMgchZAkxv0KMR6GGRkL1EdqDAASzH6k3tsYI/5rqM=; b=gfmq9vNWv1+Ppw5jgzaBNrb2/V2CDZFOVbLUSG5/S4Uw0q9w0uBfeyDxliCPR6EoE3 VFwja8c4dhNYUR7Mge9f63R1/VXV9zEHN49p8XB0DekF+n+z8okpA5RlszUna/+SC8uQ wqMy4fsx9mVC+H9RlSmXYqjBbTNnyEMK984uM= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type:content-transfer-encoding; b=bFUcfWlNWTNjDS+ltOVRv+mcmuDRSEz+YEyB7YexivD1orkxFAZGO2N0dpL0Uj5oOq b6uIbroVfHSJw4coLpTd5+QvnLCx1GE3Fg549A3BwkSl9ls7pOwT8pivigzbw//2K3zx MQ4p9mu1ZNkNs4wkJhjn3x2XH4U1JMaQpPc+Y= Received: by 10.231.30.76 with SMTP id t12mr13524534ibc.161.1285148427180; Wed, 22 Sep 2010 02:40:27 -0700 (PDT) MIME-Version: 1.0 Received: by 10.231.144.68 with HTTP; Wed, 22 Sep 2010 02:40:07 -0700 (PDT) In-Reply-To: References: <000501cb59aa$3b82eea0$b288cbe0$@thetaphi.de> <4C995B58.7020706@gmail.com> From: Adriano Crestani Date: Wed, 22 Sep 2010 05:40:07 -0400 Message-ID: Subject: Re: How to export lucene index to a simple text file? To: java-user@lucene.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org > Saving the index in text format would also be a fun codec (in 4.0) to cre= ate :) A codec like that would be welcome :) On Wed, Sep 22, 2010 at 5:31 AM, Michael McCandless wrote: > Saving the index in text format would also be a fun codec (in 4.0) to cre= ate :) > > Ie, the codec would be read/write. =A0The performance wouldn't be great, > but it'd be neat for debugging, teaching, transparency purposes... > > Mike > > On Tue, Sep 21, 2010 at 9:26 PM, Lance Norskog wrote: >> The Lucene CheckIndex program opens an index and walks all of the data >> structures. It is a good start for you. >> >> Sahin Buyrukbilen wrote: >>> >>> Thank you Uwe, I will read the docs and try to do it, however do you ha= ve >>> an >>> example code? I need because I am not very familiar with Java. >>> >>> Thank you. >>> >>> Sahin >>> >>> On Tue, Sep 21, 2010 at 12:29 PM, Uwe Schindler =A0wro= te: >>> >>> >>>> >>>> Hi, >>>> >>>> Retrieve a TermEnum and iterate it. By that you get all terms and can >>>> retrieve the docFreq, which is the second column in your table. Finall= y >>>> for >>>> each term you position the TermDocs enum on this term to get all docum= ent >>>> ids. Read docs of IndexReader/TermEnum/TermDocs about this. >>>> >>>> Uwe >>>> >>>> ----- >>>> Uwe Schindler >>>> H.-H.-Meier-Allee 63, D-28213 Bremen >>>> http://www.thetaphi.de >>>> eMail: uwe@thetaphi.de >>>> >>>> >>>>> >>>>> -----Original Message----- >>>>> From: Sahin Buyrukbilen [mailto:sahin.buyrukbilen@gmail.com] >>>>> Sent: Tuesday, September 21, 2010 9:12 AM >>>>> To: java-user@lucene.apache.org >>>>> Subject: How to export lucene index to a simple text file? >>>>> >>>>> Hi, >>>>> >>>>> I am currently working on a project about private information retriev= al >>>>> >>>> >>>> and I >>>> >>>>> >>>>> need to have an inverted index file in txt format as follows: >>>>> >>>>> Term t =A0 =A0freq t =A0 =A0 =A0Inverted list for t >>>>> >>>>> ---------------------------------------------------------------------= ---- >>>>> and =A0 =A0 =A0 =A0 =A01<6, 0.159> >>>>> big =A0 =A0 =A0 =A0 =A0 2<2, 0.148> =A0<3, 0.088> >>>>> dark =A0 =A0 =A0 =A0 1<6, 0.079> >>>>> . >>>>> . >>>>> . >>>>> . >>>>> >>>>> here the =A0pairs are indicating: number1: doc ID, = where >>>>> term t exist with a rank of number2. >>>>> >>>>> I have created an index from 5492 txt files, however the index is >>>>> >>>> >>>> composed >>>> of >>>> >>>>> >>>>> different files and most of the data is not in the text format. >>>>> >>>>> could somebody guide me to achieve this? >>>>> >>>>> Thank you >>>>> >>>>> Sahin. >>>>> >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >>>> For additional commands, e-mail: java-user-help@lucene.apache.org >>>> >>>> >>>> >>> >>> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >> For additional commands, e-mail: java-user-help@lucene.apache.org >> >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org