lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rob Jose" <rjos...@comcast.net>
Subject Re: Index Size
Date Thu, 19 Aug 2004 13:42:39 GMT
Bernhard
Thanks for responding.  I do have an IndexReader open on the Temp index.  I
pass this IndexReader into the addIndexes method on the IndexWriter to add
these files.  I did notice that I have a ton of CFS files that I removed and
was still able to read the indexes.  Are these the temporary segment files
you are talking about?  Here is my code that adds the temp index to the prod
index.
File tempFile = new File(sIndex + File.separatorChar + "temp" + sCntyCode);
tempReader = IndexReader.open(tempFile);

try

{

boolean createIndex = false;

File f = new File(sIndex + File.separatorChar + sCntyCode);

if (!f.exists())

{

createIndex = true;

}

prodWriter = new IndexWriter(sIndex + File.separatorChar + sCntyCode, new
StandardAnalyzer(), createIndex);

}

catch (Exception e)

{

IndexReader.unlock(FSDirectory.getDirectory(sIndex + File.separatorChar +
sCntyCode, false));

CasesReports.log("Tried to Unlock " + sIndex);

prodWriter = new IndexWriter(sIndex, new StandardAnalyzer(), false);

CasesReports.log("Successfully Unlocked " + sIndex + File.separatorChar +
sCntyCode);

}

prodWriter.setUseCompoundFile(true);

prodWriter.addIndexes(new IndexReader[] { tempReader });



Am I doing something wrong?  Any help would be extremely appreciated.



Thanks

Rob

----- Original Message ----- 
From: "Bernhard Messer" <Bernhard.Messer@intrafind.de>
To: "Lucene Users List" <lucene-user@jakarta.apache.org>
Sent: Thursday, August 19, 2004 1:09 AM
Subject: Re: Index Size


Rob,

as Doug and Paul already mentioned, the index size is definitely to big :-(.

What could raise the problem, especially when running on a windows
platform, is that an IndexReader is open during the whole index process.
During indexing, the writer creates temporary segment files which will
be merged into bigger segments. If done, the old segment files will be
deleted. If there is an open IndexReader, the environment is unable to
unlock the files and they still stay in the index directory. You will
end up with an index, several times bigger than the dataset.

Can you check your code for any open IndexReaders when indexing, or
paste the relevant part to the list so we could have a look on it.

hope this helps
Bernhard


Rob Jose wrote:

>Hello
>I have indexed several thousand (52 to be exact) text files and I keep
running out of disk space to store the indexes.  The size of the documents I
have indexed is around 2.5 GB.  The size of the Lucene indexes is around 287
GB.  Does this seem correct?  I am not storing the contents of the file,
just indexing and tokenizing.  I am using Lucene 1.3 final.  Can you guys
let me know what you are experiencing?  I don't want to go into production
with something that I should be configuring better.
>
>I am not sure if this helps, but I have a temp index and a real index.  I
index the file into the temp index, and then merge the temp index into the
real index using the addIndexes method on the IndexWriter.  I have also set
the production writer setUseCompoundFile to true.  I did not set this on the
temp index.  The last thing that I do before closing the production writer
is to call the optimize method.
>
>I would really appreciate any ideas to get the index size smaller if it is
at all possible.
>
>Thanks
>Rob
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message