db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Van Couvering <David.Vancouver...@Sun.COM>
Subject Re: more background threads
Date Fri, 01 Apr 2005 01:45:15 GMT
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
  <meta content="text/html;charset=ISO-8859-1" http-equiv="Content-Type">
  <title></title>
</head>
<body bgcolor="#ffffff" text="#000000">
<br>
Satheesh Bandaram wrote:<br>
<blockquote cite="mid424CA432.6010100@Sourcery.Org" type="cite">
  <pre wrap="">Very interesting plans. What kind of application are you thinking of
running on these 4-way or 8 way machines? Will you be using embedded
driver or client-server network driver?
  </pre>
</blockquote>
<br>
We have a few products within Sun who are interested in using Derby as
their embedded data store.&nbsp; They like its lightweight nature and the
fact that it can be embedded.&nbsp; Sun of course has higher-scale machines
and customers who may want to scale up, so we want to make sure Derby
can handle that.&nbsp; I have to check, but I think there is interest in
both the embedded and network drivers.<br>
<br>
<blockquote cite="mid424CA432.6010100@Sourcery.Org" type="cite">
  <pre wrap="">
I am also wondering how is the scalability of Java VMs these days? I
know earlier VMs (a few years back) weren't scaling well beyond 3 or 4
CPUs. Is that any better with Sun or IBM VMs?
  </pre>
</blockquote>
Well, I don't personally have the details and exact numbers, but I know
a lot of work has been done to improve scalability of the VM.&nbsp; The
latest implementations at Sun use native threads and provide really
good parallel GC implementations.&nbsp;&nbsp; I know that our app server, which
is written all in Java, scales quite well.&nbsp; Generally we don't need to
run more than one instance of the app server per machine on even the
much bigger Sun boxes.&nbsp; <br>
<br>
We also recently converted our web server to start using the new NIO
package and the "select" model for handling incoming connections, and
now our all Java web server scales better than our C-based web server,
which itself has been winning many of the performance benchmarks out
there.&nbsp; I was actually going to bring this up at some point as an idea
for a TODO at some point -- convert the network IO and potentially the
disk IO subsystems of Derby to start using NIO...&nbsp; <br>
<br>
Cheers,<br>
<br>
David<br>
<blockquote cite="mid424CA432.6010100@Sourcery.Org" type="cite">
  <pre wrap="">
Satheesh

David Van Couvering wrote:

  </pre>
  <blockquote type="cite">
    <pre wrap="">Hi, Mike, thanks for the response and very helpful overview.  At first
blush it seems like the single daemon could easily be converted to a
thread pool approach where work is posted to a "dispatcher" who grabs
a thread and dispatches the work to it.  I say this without having yet
looked at the code, but in the meantime any reasons why this obviously
won't work would be much appreciated.

I can work on building up a test case that fills up the background
thread so we can "prove" that whatever solution we come up with helps
the system scale better.  I can post a test plan prior to actually
creating the test to see if you all agree the test looks to be what we
want it to be.

I can also look into testing Derby scalability on a 4-way or 8-way
machine, I think some of these are available in our lab.  I would also
like to do some testing on some of Sun's new multi-core chips, where
you have 8 threads per core and 4-8 cores per CPU.  Derby seems to be
well-suited to this architecture but it would be good to see if there
are any gotchas.  Again, I would proposed these as plans first and get
your feedback.

What protocol do I use to sort of "identify" this is a sub-project and
track its progress?  Do I create a JIRA item labelled as an
"improvement" and assign it to myself?

Thanks,

David

Mike Matrigali wrote:

    </pre>
    <blockquote type="cite">
      <pre wrap="">I have changed the subject, as I completely missed the original post
which had something to do with adding Junit tests.

I am not sure what is the right solution here, but getting a discussion
going would be good.

Currently a number of store actions are queued in "post commit" mode,
which means they should be executed until after the transaction which
queued them commits.  Currently there is one background thread which
processes these, if it gets too full then the work is done by the actual
thread which queued the work.   Most of the post commit work involves
claiming space from deleted rows after their transaction commits.

Going forward there is going to be a need for more background work.  I
soon will be posting the first phase of work to allow for returning
space back to the operating system, eventually it would be best if this
work was also done in background, somehow automatically queued by the
system.

I would also recommend coming up with a usage scenario which shows a
problem before coding up a solution.  I believe a test with lots of
users doing insert and delete should eventually show the background task
being bogged down -- but I am not sure if moving work to additional
threads is much better than just spreading the work out across the
existing user threads.

The code for the current background thread can be found in:
opensource/java/engine/org/apache/derby/impl/services/daemon

An example of one of the unit of work put on the queue is in:
opensource/java/engine/org/apache/derby/impl/store/access/heap/heappostcommit.java


Dan is probably the person who most recently worked on this code, and
should have some comments in this area.  He should be back active on the
list early next week.

Note another interesting area of research/coding would be to see how
derby scales on larger number of processor machines.  Not much work has
been done at all on machines with more than 2 processors.  The system
has been designed from bottom up to be multi-threaded, but not much
testing/monitoring has been done on 4 or more processor machines.   The
following single threading points exist in derby:
   o each user query is executed by a single thread.
   o the locking system in protected by a single java synchonization
point.
   o copying log records into the log is a single sync point
   o finding a buffer in the buffer cache is a single sync point

All of these seemed to be reasonable designs for 1, 2 and 4 way
machines.

/mikem


David Van Couvering wrote:

 

      </pre>
      <blockquote type="cite">
        <pre wrap="">I noticed on the todo list there is a need to have more than one
background thread to enable better scalability with lots of client
connections.  I'm trying to find a way to gently work my way into doing
some work on Derby, and this seemed like a project of small enough
scope
to get my feet wet.  Is there any background on this, or should I just
jump right in?  I didn't see any discussion of this on the list...

Thanks,

David

  
        </pre>
      </blockquote>
    </blockquote>
    <pre wrap="">

    </pre>
  </blockquote>
  <pre wrap=""><!---->
  </pre>
</blockquote>
</body>
</html>

Mime
View raw message