lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lance Norskog <goks...@gmail.com>
Subject Re: How to export lucene index to a simple text file?
Date Wed, 22 Sep 2010 01:26:48 GMT
The Lucene CheckIndex program opens an index and walks all of the data 
structures. It is a good start for you.

Sahin Buyrukbilen wrote:
> Thank you Uwe, I will read the docs and try to do it, however do you have an
> example code? I need because I am not very familiar with Java.
>
> Thank you.
>
> Sahin
>
> On Tue, Sep 21, 2010 at 12:29 PM, Uwe Schindler<uwe@thetaphi.de>  wrote:
>
>    
>> Hi,
>>
>> Retrieve a TermEnum and iterate it. By that you get all terms and can
>> retrieve the docFreq, which is the second column in your table. Finally for
>> each term you position the TermDocs enum on this term to get all document
>> ids. Read docs of IndexReader/TermEnum/TermDocs about this.
>>
>> Uwe
>>
>> -----
>> Uwe Schindler
>> H.-H.-Meier-Allee 63, D-28213 Bremen
>> http://www.thetaphi.de
>> eMail: uwe@thetaphi.de
>>
>>      
>>> -----Original Message-----
>>> From: Sahin Buyrukbilen [mailto:sahin.buyrukbilen@gmail.com]
>>> Sent: Tuesday, September 21, 2010 9:12 AM
>>> To: java-user@lucene.apache.org
>>> Subject: How to export lucene index to a simple text file?
>>>
>>> Hi,
>>>
>>> I am currently working on a project about private information retrieval
>>>        
>> and I
>>      
>>> need to have an inverted index file in txt format as follows:
>>>
>>> Term t    freq t      Inverted list for t
>>> -------------------------------------------------------------------------
>>> and          1<6, 0.159>
>>> big           2<2, 0.148>  <3, 0.088>
>>> dark         1<6, 0.079>
>>> .
>>> .
>>> .
>>> .
>>>
>>> here the<number1, number2>  pairs are indicating: number1: doc ID, where
>>> term t exist with a rank of number2.
>>>
>>> I have created an index from 5492 txt files, however the index is
>>>        
>> composed
>> of
>>      
>>> different files and most of the data is not in the text format.
>>>
>>> could somebody guide me to achieve this?
>>>
>>> Thank you
>>>
>>> Sahin.
>>>        
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>>      
>    

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message