lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <gsing...@syr.edu>
Subject Re: Seeing what's occupying all the space in the index
Date Fri, 26 May 2006 11:41:03 GMT
Give Luke a try.  Google for "Luke Lucene" and you should find it.  
Otherwise check the Lucene website for a reference.

Rob Staveley (Tom) wrote:
> In my index of e-mail message parts, it looks like 23K is being used up for
> each indexed message part, which is way more than I'd expect. 
>
> I have a total of 37 fields per message part.
> I tokenize, index and do not store message part bodies.
> I store a <= 300 character synopsis of each message part.
> All of the other fields are message metadata, which is tokenized, indexed
> and stored but these rarely exceed 100 characters - they are for example To,
> From, Cc, Subject, Date
>
> I'm still using Lucene 1.4.3, but am in the process of migrating to 1.9.
>
> Is there any way that I can get a picture of what's occupying all the space?
>   

-- 

Grant Ingersoll 
Sr. Software Engineer 
Center for Natural Language Processing 
Syracuse University 
School of Information Studies 
335 Hinds Hall 
Syracuse, NY 13244 

http://www.cnlp.org 
Voice:  315-443-5484 
Fax: 315-443-6886 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message