db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Matrigali <mikem_...@sbcglobal.net>
Subject Re: more background threads
Date Fri, 01 Apr 2005 23:48:14 GMT
Any expertise you can lend in the area of I/O or scalability would
be greatly appreciated.  I only have access to a dual processor
machine (and it only has 2 700mhz cpu's).  I am only guessing
currently at where problems might be, if you can identify problems
then projects should flow from that.

Suresh and I looked at nio for log I/O over a year ago and did not
see any improvement on intel/windows and/or intel/linux.  I thought
that we could avoid some OS/JVM buffer copies by using it, but we
never measured difference.  If you can get log or database I/O to
go faster using it that would be great.

Note that many standard performance benchmarks will quickly become
I/O bound on derby, unless the hardware is configured optimally.  Unlike
most db's derby requires database on a single device and log on a single
device, assuming that if applications needed better throughput they
could use the hardware or OS to spread I/O across multiple devices while
presenting the view of a single device to the JVM and thus to derby.

David Van Couvering wrote:
> Satheesh Bandaram wrote:
>>Very interesting plans. What kind of application are you thinking of
>>running on these 4-way or 8 way machines? Will you be using embedded
>>driver or client-server network driver?
> We have a few products within Sun who are interested in using Derby as 
> their embedded data store.  They like its lightweight nature and the 
> fact that it can be embedded.  Sun of course has higher-scale machines 
> and customers who may want to scale up, so we want to make sure Derby 
> can handle that.  I have to check, but I think there is interest in both 
> the embedded and network drivers.
>>I am also wondering how is the scalability of Java VMs these days? I
>>know earlier VMs (a few years back) weren't scaling well beyond 3 or 4
>>CPUs. Is that any better with Sun or IBM VMs?
> Well, I don't personally have the details and exact numbers, but I know 
> a lot of work has been done to improve scalability of the VM.  The 
> latest implementations at Sun use native threads and provide really good 
> parallel GC implementations.   I know that our app server, which is 
> written all in Java, scales quite well.  Generally we don't need to run 
> more than one instance of the app server per machine on even the much 
> bigger Sun boxes. 
> We also recently converted our web server to start using the new NIO 
> package and the "select" model for handling incoming connections, and 
> now our all Java web server scales better than our C-based web server, 
> which itself has been winning many of the performance benchmarks out 
> there.  I was actually going to bring this up at some point as an idea 
> for a TODO at some point -- convert the network IO and potentially the 
> disk IO subsystems of Derby to start using NIO... 
> Cheers,
> David
>>David Van Couvering wrote:
>>>Hi, Mike, thanks for the response and very helpful overview.  At first
>>>blush it seems like the single daemon could easily be converted to a
>>>thread pool approach where work is posted to a "dispatcher" who grabs
>>>a thread and dispatches the work to it.  I say this without having yet
>>>looked at the code, but in the meantime any reasons why this obviously
>>>won't work would be much appreciated.
>>>I can work on building up a test case that fills up the background
>>>thread so we can "prove" that whatever solution we come up with helps
>>>the system scale better.  I can post a test plan prior to actually
>>>creating the test to see if you all agree the test looks to be what we
>>>want it to be.
>>>I can also look into testing Derby scalability on a 4-way or 8-way
>>>machine, I think some of these are available in our lab.  I would also
>>>like to do some testing on some of Sun's new multi-core chips, where
>>>you have 8 threads per core and 4-8 cores per CPU.  Derby seems to be
>>>well-suited to this architecture but it would be good to see if there
>>>are any gotchas.  Again, I would proposed these as plans first and get
>>>your feedback.
>>>What protocol do I use to sort of "identify" this is a sub-project and
>>>track its progress?  Do I create a JIRA item labelled as an
>>>"improvement" and assign it to myself?
>>>Mike Matrigali wrote:
>>>>I have changed the subject, as I completely missed the original post
>>>>which had something to do with adding Junit tests.
>>>>I am not sure what is the right solution here, but getting a discussion
>>>>going would be good.
>>>>Currently a number of store actions are queued in "post commit" mode,
>>>>which means they should be executed until after the transaction which
>>>>queued them commits.  Currently there is one background thread which
>>>>processes these, if it gets too full then the work is done by the actual
>>>>thread which queued the work.   Most of the post commit work involves
>>>>claiming space from deleted rows after their transaction commits.
>>>>Going forward there is going to be a need for more background work.  I
>>>>soon will be posting the first phase of work to allow for returning
>>>>space back to the operating system, eventually it would be best if this
>>>>work was also done in background, somehow automatically queued by the
>>>>I would also recommend coming up with a usage scenario which shows a
>>>>problem before coding up a solution.  I believe a test with lots of
>>>>users doing insert and delete should eventually show the background task
>>>>being bogged down -- but I am not sure if moving work to additional
>>>>threads is much better than just spreading the work out across the
>>>>existing user threads.
>>>>The code for the current background thread can be found in:
>>>>An example of one of the unit of work put on the queue is in:
>>>>Dan is probably the person who most recently worked on this code, and
>>>>should have some comments in this area.  He should be back active on the
>>>>list early next week.
>>>>Note another interesting area of research/coding would be to see how
>>>>derby scales on larger number of processor machines.  Not much work has
>>>>been done at all on machines with more than 2 processors.  The system
>>>>has been designed from bottom up to be multi-threaded, but not much
>>>>testing/monitoring has been done on 4 or more processor machines.   The
>>>>following single threading points exist in derby:
>>>>   o each user query is executed by a single thread.
>>>>   o the locking system in protected by a single java synchonization
>>>>   o copying log records into the log is a single sync point
>>>>   o finding a buffer in the buffer cache is a single sync point
>>>>All of these seemed to be reasonable designs for 1, 2 and 4 way
>>>>David Van Couvering wrote:
>>>>>I noticed on the todo list there is a need to have more than one
>>>>>background thread to enable better scalability with lots of client
>>>>>connections.  I'm trying to find a way to gently work my way into doing
>>>>>some work on Derby, and this seemed like a project of small enough
>>>>>to get my feet wet.  Is there any background on this, or should I just
>>>>>jump right in?  I didn't see any discussion of this on the list...

View raw message