lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <>
Subject Re: Field name size and index size
Date Sat, 22 Mar 2008 14:27:11 GMT

Summary: I think there will be no real impact if you use longer field  


Index size will be just a tiny bit bigger.  There is a single file  
per segment (*.fnm) that resolves the field names into integer IDs,  
then the rest of the index uses these integer IDs.  So only that  
file, which holds one copy of the string name of each field, will be  

Indexing may be a bit faster with shorter names just because Lucene  
does a hashtable lookup of that string (one per field per document)  
to find the corresponding integer.  My guess is this is a very  
negligible impact, especially if the content of each field is long.   
Searching goes through a similar hashtable lookup, after which the  
field name is interned, but for a largish index that time is surely  
negligible as well.  But you should test!  And please report back :)


John wrote:
> Hi,
> Lets say my data source consists of records like so (the example is  
> Field=Value):
> And lets say I a second copy of my data but this time it looks like  
> so:
> ? A=Value1
> ? B=Value2
> ? C=Value3
> ? D=Value4
> I..e, same data, the only change is the field names?are now shorter
> Now if?i create two Lucene indexes ... one using the long field  
> name and one using the short field name (my data has not  
> changed) .. will the Lucene index size be smaller for the short  
> field name one?? Will updating and optimizing the index be faster??  
> Will searching be faster?
> That is, i'm I better off using short field names vs. long field  
> names?
> Yes, i will do some performance analyses .. but i want to know if  
> this matters before I do so.
> Thanks in advance!
> -JM

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message