lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rob Staveley (Tom)" <>
Subject RE: Seeing what's occupying all the space in the index
Date Fri, 26 May 2006 09:33:50 GMT
In my index of e-mail message parts, it looks like 23K is being used up for
each indexed message part, which is way more than I'd expect. 

I have a total of 37 fields per message part.
I tokenize, index and do not store message part bodies.
I store a <= 300 character synopsis of each message part.
All of the other fields are message metadata, which is tokenized, indexed
and stored but these rarely exceed 100 characters - they are for example To,
From, Cc, Subject, Date

I'm still using Lucene 1.4.3, but am in the process of migrating to 1.9.

Is there any way that I can get a picture of what's occupying all the space?

View raw message