lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rob Jose" <rjos...@comcast.net>
Subject Re: Index Size
Date Thu, 19 Aug 2004 13:25:31 GMT
Paul
Thank you for your response.  I have appended to the bottom of this message
the field structure that I am using.  I hope that this helps.  I am using
the StandardAnalyzer.  I do not believe that I am changing any default
values, but I have also appended the code that adds the temp index to the
production index.

Thanks for you help
Rob

Here is the code that describes the field structure.
public static Document Document(String contents, String path, Date modified,
String runDate, String totalpages, String pagecount, String countycode,
String reportnum, String reportdescr)

{

SimpleDateFormat showFormat = new
SimpleDateFormat(TurbineResources.getString("date.default.format"));

SimpleDateFormat searchFormat = new SimpleDateFormat("yyyyMMdd");

Document doc = new Document();

doc.add(Field.Keyword("path", path));

doc.add(Field.Keyword("modified", showFormat.format(modified)));

doc.add(Field.UnStored("searchDate", searchFormat.format(modified)));

doc.add(Field.Keyword("runDate", runDate==null?"":runDate));

doc.add(Field.UnStored("searchRunDate",
runDate==null?"":runDate.substring(6)+runDate.substring(0,2)+runDate.substri
ng(3,5)));

doc.add(Field.Keyword("reportnum", reportnum));

doc.add(Field.Text("reportdescr", reportdescr));

doc.add(Field.UnStored("cntycode", countycode));

doc.add(Field.Keyword("totalpages", totalpages));

doc.add(Field.Keyword("page", pagecount));

doc.add(Field.UnStored("contents", contents));

return doc;

}



Here is the code that adds the temp index to the production index.

File tempFile = new File(sIndex + File.separatorChar + "temp" + sCntyCode);

tempReader = IndexReader.open(tempFile);

try

{

boolean createIndex = false;

File f = new File(sIndex + File.separatorChar + sCntyCode);

if (!f.exists())

{

createIndex = true;

}

prodWriter = new IndexWriter(sIndex + File.separatorChar + sCntyCode, new
StandardAnalyzer(), createIndex);

}

catch (Exception e)

{

IndexReader.unlock(FSDirectory.getDirectory(sIndex + File.separatorChar +
sCntyCode, false));

CasesReports.log("Tried to Unlock " + sIndex);

prodWriter = new IndexWriter(sIndex, new StandardAnalyzer(), false);

CasesReports.log("Successfully Unlocked " + sIndex + File.separatorChar +
sCntyCode);

}

prodWriter.setUseCompoundFile(true);

prodWriter.addIndexes(new IndexReader[] { tempReader });





----- Original Message ----- 
From: "Paul Elschot" <paul.elschot@xs4all.nl>
To: <lucene-user@jakarta.apache.org>
Sent: Thursday, August 19, 2004 12:16 AM
Subject: Re: Index Size


On Wednesday 18 August 2004 22:44, Rob Jose wrote:
> Hello
> I have indexed several thousand (52 to be exact) text files and I keep
> running out of disk space to store the indexes.  The size of the documents
> I have indexed is around 2.5 GB.  The size of the Lucene indexes is around
> 287 GB.  Does this seem correct?  I am not storing the contents of the

As noted, one would expect the index size to be about 35%
of the original text, ie. about 2.5GB * 35% = 800MB.
That is two orders of magnitude off from what you have.

Could you provide some more information about the field structure,
ie. how many fields, which fields are stored, which fields are indexed,
evt. use of non standard analyzers, and evt. non standard
Lucene settings?

You might also try to change to non compound format to have a look
at the sizes of the individual index files, see file formats on the lucene
web site.
You can then see the total disk size of for example the stored fields.

Regards,
Paul Elschot


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message