db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Van Couvering <David.Vancouver...@Sun.COM>
Subject Re: more background threads
Date Mon, 04 Apr 2005 23:33:42 GMT
Hi, Mike, I'll look into it.  I'm on vacation most of this week but will 
be back in the saddle next week.

Regarding NIO, the performance improvement was on the network side, 
using the select-like way of handling client requests.  I am not so sure 
of how much better the NIO package is for disk I/O.  I did hear that 
things are significantly better in JDK 1.5, but that's only hearsay at 
this point.


Mike Matrigali wrote:

> Any expertise you can lend in the area of I/O or scalability would
> be greatly appreciated.  I only have access to a dual processor
> machine (and it only has 2 700mhz cpu's).  I am only guessing
> currently at where problems might be, if you can identify problems
> then projects should flow from that.
> Suresh and I looked at nio for log I/O over a year ago and did not
> see any improvement on intel/windows and/or intel/linux.  I thought
> that we could avoid some OS/JVM buffer copies by using it, but we
> never measured difference.  If you can get log or database I/O to
> go faster using it that would be great.
> Note that many standard performance benchmarks will quickly become
> I/O bound on derby, unless the hardware is configured optimally.  Unlike
> most db's derby requires database on a single device and log on a single
> device, assuming that if applications needed better throughput they
> could use the hardware or OS to spread I/O across multiple devices while
> presenting the view of a single device to the JVM and thus to derby.
> David Van Couvering wrote:
>> Satheesh Bandaram wrote:
>>> Very interesting plans. What kind of application are you thinking of
>>> running on these 4-way or 8 way machines? Will you be using embedded
>>> driver or client-server network driver?
>> We have a few products within Sun who are interested in using Derby 
>> as their embedded data store.  They like its lightweight nature and 
>> the fact that it can be embedded.  Sun of course has higher-scale 
>> machines and customers who may want to scale up, so we want to make 
>> sure Derby can handle that.  I have to check, but I think there is 
>> interest in both the embedded and network drivers.
>>> I am also wondering how is the scalability of Java VMs these days? I
>>> know earlier VMs (a few years back) weren't scaling well beyond 3 or 4
>>> CPUs. Is that any better with Sun or IBM VMs?
>> Well, I don't personally have the details and exact numbers, but I 
>> know a lot of work has been done to improve scalability of the VM.  
>> The latest implementations at Sun use native threads and provide 
>> really good parallel GC implementations.   I know that our app 
>> server, which is written all in Java, scales quite well.  Generally 
>> we don't need to run more than one instance of the app server per 
>> machine on even the much bigger Sun boxes.
>> We also recently converted our web server to start using the new NIO 
>> package and the "select" model for handling incoming connections, and 
>> now our all Java web server scales better than our C-based web 
>> server, which itself has been winning many of the performance 
>> benchmarks out there.  I was actually going to bring this up at some 
>> point as an idea for a TODO at some point -- convert the network IO 
>> and potentially the disk IO subsystems of Derby to start using NIO...
>> Cheers,
>> David
>>> Satheesh
>>> David Van Couvering wrote:
>>>> Hi, Mike, thanks for the response and very helpful overview.  At first
>>>> blush it seems like the single daemon could easily be converted to a
>>>> thread pool approach where work is posted to a "dispatcher" who grabs
>>>> a thread and dispatches the work to it.  I say this without having yet
>>>> looked at the code, but in the meantime any reasons why this obviously
>>>> won't work would be much appreciated.
>>>> I can work on building up a test case that fills up the background
>>>> thread so we can "prove" that whatever solution we come up with helps
>>>> the system scale better.  I can post a test plan prior to actually
>>>> creating the test to see if you all agree the test looks to be what we
>>>> want it to be.
>>>> I can also look into testing Derby scalability on a 4-way or 8-way
>>>> machine, I think some of these are available in our lab.  I would also
>>>> like to do some testing on some of Sun's new multi-core chips, where
>>>> you have 8 threads per core and 4-8 cores per CPU.  Derby seems to be
>>>> well-suited to this architecture but it would be good to see if there
>>>> are any gotchas.  Again, I would proposed these as plans first and get
>>>> your feedback.
>>>> What protocol do I use to sort of "identify" this is a sub-project and
>>>> track its progress?  Do I create a JIRA item labelled as an
>>>> "improvement" and assign it to myself?
>>>> Thanks,
>>>> David
>>>> Mike Matrigali wrote:
>>>>> I have changed the subject, as I completely missed the original post
>>>>> which had something to do with adding Junit tests.
>>>>> I am not sure what is the right solution here, but getting a 
>>>>> discussion
>>>>> going would be good.
>>>>> Currently a number of store actions are queued in "post commit" mode,
>>>>> which means they should be executed until after the transaction which
>>>>> queued them commits.  Currently there is one background thread which
>>>>> processes these, if it gets too full then the work is done by the 
>>>>> actual
>>>>> thread which queued the work.   Most of the post commit work involves
>>>>> claiming space from deleted rows after their transaction commits.
>>>>> Going forward there is going to be a need for more background 
>>>>> work.  I
>>>>> soon will be posting the first phase of work to allow for returning
>>>>> space back to the operating system, eventually it would be best if 
>>>>> this
>>>>> work was also done in background, somehow automatically queued by the
>>>>> system.
>>>>> I would also recommend coming up with a usage scenario which shows a
>>>>> problem before coding up a solution.  I believe a test with lots of
>>>>> users doing insert and delete should eventually show the 
>>>>> background task
>>>>> being bogged down -- but I am not sure if moving work to additional
>>>>> threads is much better than just spreading the work out across the
>>>>> existing user threads.
>>>>> The code for the current background thread can be found in:
>>>>> opensource/java/engine/org/apache/derby/impl/services/daemon
>>>>> An example of one of the unit of work put on the queue is in:
>>>>> opensource/java/engine/org/apache/derby/impl/store/access/heap/heappostcommit.java

>>>>> Dan is probably the person who most recently worked on this code, and
>>>>> should have some comments in this area.  He should be back active 
>>>>> on the
>>>>> list early next week.
>>>>> Note another interesting area of research/coding would be to see how
>>>>> derby scales on larger number of processor machines.  Not much 
>>>>> work has
>>>>> been done at all on machines with more than 2 processors.  The system
>>>>> has been designed from bottom up to be multi-threaded, but not much
>>>>> testing/monitoring has been done on 4 or more processor 
>>>>> machines.   The
>>>>> following single threading points exist in derby:
>>>>>   o each user query is executed by a single thread.
>>>>>   o the locking system in protected by a single java synchonization
>>>>> point.
>>>>>   o copying log records into the log is a single sync point
>>>>>   o finding a buffer in the buffer cache is a single sync point
>>>>> All of these seemed to be reasonable designs for 1, 2 and 4 way
>>>>> machines.
>>>>> /mikem
>>>>> David Van Couvering wrote:
>>>>>> I noticed on the todo list there is a need to have more than one
>>>>>> background thread to enable better scalability with lots of client
>>>>>> connections.  I'm trying to find a way to gently work my way into

>>>>>> doing
>>>>>> some work on Derby, and this seemed like a project of small enough
>>>>>> scope
>>>>>> to get my feet wet.  Is there any background on this, or should I

>>>>>> just
>>>>>> jump right in?  I didn't see any discussion of this on the list...
>>>>>> Thanks,
>>>>>> David

View raw message