db-derby-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kim Haase <Camilla.Ha...@Sun.COM>
Subject Re: FW: Advice on *very* badly performing query
Date Wed, 05 Dec 2007 21:19:29 GMT
That sounds like a "yes" -- I'll consider this further and file an issue.


Mike Matrigali wrote:
> Most of the "stat" information that derby uses is automatically kept up
> to date as part of underlying index and table maintenance.  This info
> includes count of rows in the table and data distribution of data in
> indexes.
> The one piece of info that is not kept up to date is average number of
> duplicates for columns in an index.  This stat is given a default and
> then is updated whenever you create an index, run the discussed compress
> option, and as a side effect of some of the alter table commands.  I 
> don't remember what the default is, something like 10%.
> /mikem
> Kim Haase wrote:
>> Do you think it might also help to put the information into the Tuning 
>> Guide under "Performance tips and tricks"?
>> http://db.apache.org/derby/docs/dev/tuning/tuning-single.html#ctunperf22457 
>> Thanks,
>> Kim Haase
>> Matt Doran wrote:
>>> Dyre.Tjeldvoll@Sun.COM wrote:
>>>> Matt Doran <matt.doran@papercut.com> writes:
>>>>> I had not idea
>>>>> that derby didn't keep any stats up-to-date without performing that
>>>>> operation explicitly.  Ideally it would keep this up-to-date itself.
>>>>> The sys.sysstatistics didn't have any rows in it until I ran the
>>>>> compress table operation.
>>>> Agreed, but strangely I cannot recall many users actually requesting
>>>> this. Maybe people just suffer silently?   
>>> We have hundreds if not thousands of customers using our product and 
>>> not many of them have seen this pathological performance problem.   
>>> So maybe the optimizer does a good enough job in 90% of cases.  We 
>>> just happened to hit the an extremely bad case.
>>> So maybe it's just not something that people notice often.  Or they 
>>> just think "oh it's an embedded java database, it probably doesn't 
>>> perform that well.  Let's just upgrade to a real database".    That's 
>>> what we did, and it's what other people probably do.
>>> The beauty of the embedded DB is that it is self-maintaining.   I 
>>> suspect that if it maintained the statistics by itself and therefore 
>>> there were performance benefits ... it will improve people's 
>>> perception that it performs well.
>>>> Anyway, thank you for what I would call an exemplary
>>>> bug-report/question! Even though you use Hibernate you took the time
>>>> to identfy the actual SQL causing the problem, identified a minimal
>>>> repro and provided query plans.
>>> Thanks.  I had trouble understanding the behaviour ... so I though 
>>> that nobody would believe me unless I provided enough evidence.
>>>>> It really needs to be made more prominent in the documentation.
>>>>> i.e. once your database is loaded with representative data, perform
>>>>> the compress op for optimal performance.
>>>> Agreed. Any thoughts on where it would be good to mention it? If
>>>> you want, you can file a Jira issue about this.
>>> I'm not sure.  No-one is every going to read a whole manual.    But I 
>>> had read the ApacheCon performance presentations, and I don't 
>>> remember them ever mention this.  I think those presentations would 
>>> be one of the first places people start when they have performance 
>>> problems.   I know you can't change these retrospectively ... but 
>>> maybe making this clear in the wiki would be a start.
>>> I'm not sure if this is the appropriate page, but it was the only 
>>> thing that looked relevant to performance ( 
>>> http://wiki.apache.org/db-derby/PerformanceDiagnosisTips). It doesn't 
>>> mention the stats/compress.
>>> Maybe some of the tips in those presentations should be the distilled 
>>> into some performance tips wiki pages ... and also make it clear that 
>>> stats need to be updated.
>>> Regards,
>>> Matt

View raw message