lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: How to export lucene index to a simple text file?
Date Wed, 22 Sep 2010 09:31:06 GMT
Saving the index in text format would also be a fun codec (in 4.0) to create :)

Ie, the codec would be read/write.  The performance wouldn't be great,
but it'd be neat for debugging, teaching, transparency purposes...

Mike

On Tue, Sep 21, 2010 at 9:26 PM, Lance Norskog <goksron@gmail.com> wrote:
> The Lucene CheckIndex program opens an index and walks all of the data
> structures. It is a good start for you.
>
> Sahin Buyrukbilen wrote:
>>
>> Thank you Uwe, I will read the docs and try to do it, however do you have
>> an
>> example code? I need because I am not very familiar with Java.
>>
>> Thank you.
>>
>> Sahin
>>
>> On Tue, Sep 21, 2010 at 12:29 PM, Uwe Schindler<uwe@thetaphi.de>  wrote:
>>
>>
>>>
>>> Hi,
>>>
>>> Retrieve a TermEnum and iterate it. By that you get all terms and can
>>> retrieve the docFreq, which is the second column in your table. Finally
>>> for
>>> each term you position the TermDocs enum on this term to get all document
>>> ids. Read docs of IndexReader/TermEnum/TermDocs about this.
>>>
>>> Uwe
>>>
>>> -----
>>> Uwe Schindler
>>> H.-H.-Meier-Allee 63, D-28213 Bremen
>>> http://www.thetaphi.de
>>> eMail: uwe@thetaphi.de
>>>
>>>
>>>>
>>>> -----Original Message-----
>>>> From: Sahin Buyrukbilen [mailto:sahin.buyrukbilen@gmail.com]
>>>> Sent: Tuesday, September 21, 2010 9:12 AM
>>>> To: java-user@lucene.apache.org
>>>> Subject: How to export lucene index to a simple text file?
>>>>
>>>> Hi,
>>>>
>>>> I am currently working on a project about private information retrieval
>>>>
>>>
>>> and I
>>>
>>>>
>>>> need to have an inverted index file in txt format as follows:
>>>>
>>>> Term t    freq t      Inverted list for t
>>>>
>>>> -------------------------------------------------------------------------
>>>> and          1<6, 0.159>
>>>> big           2<2, 0.148>  <3, 0.088>
>>>> dark         1<6, 0.079>
>>>> .
>>>> .
>>>> .
>>>> .
>>>>
>>>> here the<number1, number2>  pairs are indicating: number1: doc ID,
where
>>>> term t exist with a rank of number2.
>>>>
>>>> I have created an index from 5492 txt files, however the index is
>>>>
>>>
>>> composed
>>> of
>>>
>>>>
>>>> different files and most of the data is not in the text format.
>>>>
>>>> could somebody guide me to achieve this?
>>>>
>>>> Thank you
>>>>
>>>> Sahin.
>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>>
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message