lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steven A Rowe <>
Subject RE: Converting an existing index format to Lucene Index
Date Fri, 25 Feb 2011 13:53:28 GMT
If you sort your old index file by filename, then iterate over the sorted file, your problem
is solved, no?

From: Lokendra Singh []
Sent: Friday, February 25, 2011 1:27 AM
Subject: Converting an existing index format to Lucene Index

Hi all,

I am seeking for some guidelines to directly convert an already existing index to Lucene index.
The index available to me is of a set of <value1,value2> pairs. Where each pair is :
< word ,  fileName >
i.e a word as a 'value1', and the 'value2' being the fileName containing that word.

A word might appear in several fileNames as well a same file can contain multiple copies of
a word. For eg, following index is possible:
< "my"  , "file1" >
< "you" , "file2" >
< "my",  "file2" >
< "my", "file1">

My actual problem is that the index available to me is very large in size, hence I am bit
reluctant to create 'Document' object for each file because for that I will have to read through
all the pairs first and store them in memory. Or I will have to 'update' the 'Document' object
of a particular file while iterating through the Pairs of my index, this 'update', again,
is a costly operation.

Please correct me if my understanding of Lucene is wrong or other alternative ways.

View raw message