lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adriano Crestani <adrianocrest...@gmail.com>
Subject Re: How to export lucene index to a simple text file?
Date Wed, 22 Sep 2010 09:40:07 GMT
> Saving the index in text format would also be a fun codec (in 4.0) to create :)

A codec like that would be welcome :)

On Wed, Sep 22, 2010 at 5:31 AM, Michael McCandless
<lucene@mikemccandless.com> wrote:
> Saving the index in text format would also be a fun codec (in 4.0) to create :)
>
> Ie, the codec would be read/write.  The performance wouldn't be great,
> but it'd be neat for debugging, teaching, transparency purposes...
>
> Mike
>
> On Tue, Sep 21, 2010 at 9:26 PM, Lance Norskog <goksron@gmail.com> wrote:
>> The Lucene CheckIndex program opens an index and walks all of the data
>> structures. It is a good start for you.
>>
>> Sahin Buyrukbilen wrote:
>>>
>>> Thank you Uwe, I will read the docs and try to do it, however do you have
>>> an
>>> example code? I need because I am not very familiar with Java.
>>>
>>> Thank you.
>>>
>>> Sahin
>>>
>>> On Tue, Sep 21, 2010 at 12:29 PM, Uwe Schindler<uwe@thetaphi.de>  wrote:
>>>
>>>
>>>>
>>>> Hi,
>>>>
>>>> Retrieve a TermEnum and iterate it. By that you get all terms and can
>>>> retrieve the docFreq, which is the second column in your table. Finally
>>>> for
>>>> each term you position the TermDocs enum on this term to get all document
>>>> ids. Read docs of IndexReader/TermEnum/TermDocs about this.
>>>>
>>>> Uwe
>>>>
>>>> -----
>>>> Uwe Schindler
>>>> H.-H.-Meier-Allee 63, D-28213 Bremen
>>>> http://www.thetaphi.de
>>>> eMail: uwe@thetaphi.de
>>>>
>>>>
>>>>>
>>>>> -----Original Message-----
>>>>> From: Sahin Buyrukbilen [mailto:sahin.buyrukbilen@gmail.com]
>>>>> Sent: Tuesday, September 21, 2010 9:12 AM
>>>>> To: java-user@lucene.apache.org
>>>>> Subject: How to export lucene index to a simple text file?
>>>>>
>>>>> Hi,
>>>>>
>>>>> I am currently working on a project about private information retrieval
>>>>>
>>>>
>>>> and I
>>>>
>>>>>
>>>>> need to have an inverted index file in txt format as follows:
>>>>>
>>>>> Term t    freq t      Inverted list for t
>>>>>
>>>>> -------------------------------------------------------------------------
>>>>> and          1<6, 0.159>
>>>>> big           2<2, 0.148>  <3, 0.088>
>>>>> dark         1<6, 0.079>
>>>>> .
>>>>> .
>>>>> .
>>>>> .
>>>>>
>>>>> here the<number1, number2>  pairs are indicating: number1: doc
ID, where
>>>>> term t exist with a rank of number2.
>>>>>
>>>>> I have created an index from 5492 txt files, however the index is
>>>>>
>>>>
>>>> composed
>>>> of
>>>>
>>>>>
>>>>> different files and most of the data is not in the text format.
>>>>>
>>>>> could somebody guide me to achieve this?
>>>>>
>>>>> Thank you
>>>>>
>>>>> Sahin.
>>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>
>>>>
>>>>
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message