lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <cutt...@apache.org>
Subject compound format as default in 1.4?
Date Mon, 08 Mar 2004 20:25:15 GMT
[ I moved this discussion to the developer list.]

My metric here is the rate of complaint.

I'm tired of hearing about "too many file handles" problems.  Ususally 
it is caused by folks opening a new searcher for each query, and the 
garbage collector not collecting and closing the old ones fast enough, 
so it signals other problems with the application, but it is still 
annoying, and could be largely quashed.

By some definition, anything which causes so many repeated complaints is 
a bug, and should be fixed.  Even if it's really not a bug.  It pains 
users of Lucene.  It annoys developers of Lucene.

Think of it like mergeFactor, etc.: the default setting may not be the 
absolute fastest, but it is one that is likely to run well in most 
configurations and cause the least confusion.

Doug

Terry Steichen wrote:
> I tend to agree (but with the same uncertainty as to why I feel that way).
> 
> Regards,
> 
> Terry
> ----- Original Message ----- 
> From: "Otis Gospodnetic" <otis_gospodnetic@yahoo.com>
> To: "Lucene Users List" <lucene-user@jakarta.apache.org>
> Sent: Monday, March 08, 2004 2:34 PM
> Subject: Re: Sys properties Was: java.io.tmpdir as lock dir .... once again
> 
> 
> 
>>I can't explain why, but I feel like the old index format should stay
>>by default.  I feel like I'd rather a (slightly) faster index, and
>>switch to the compound one when/IF I encounter problems, than have a
>>safer, but slower index, and never realize that there is a faster
>>option available.
>>
>>Weak argument, I know, but some instinct in me thinks that the current
>>mode should remain.
>>
>>Otis
>>
>>
>>--- Doug Cutting <cutting@apache.org> wrote:
>>
>>>hui wrote:
>>>
>>>>Index time: 
>>>>compound format is 89 seconds slower.
>>>>
>>>>compound format:
>>>>1389507 total milliseconds
>>>>non-compound format:
>>>>1300534 total milliseconds
>>>>
>>>>The index size is 85m with 4 fields only. The files are stored in
>>>
>>>the index.
>>>
>>>>The compound format has only 3 files and the other has 13 files. 
>>>
>>>Thanks for performing this benchmark!
>>>
>>>It looks like the compound format is around 7% slower when indexing. 
>>>To 
>>>my thinking that's acceptable, given the dramatic reduction in file 
>>>handles.  If folks really need maximal indexing performance, then
>>>they 
>>>can explicitly disable the compound format.
>>>
>>>Would anyone object to making compound format the default for Lucene 
>>>1.4?  This is an incompatible change, but I don't think it should
>>>break 
>>>applications.
>>>
>>>Doug
>>>
>>>---------------------------------------------------------------------
>>>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>>>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>>>
>>
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>>
>>
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message