db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tiago Espinha <tiago.de...@yahoo.co.uk>
Subject Re: DRDA - An abridged tutorial
Date Fri, 09 Apr 2010 07:13:15 GMT
Oh! Wow, I'm happy that the video was useful to someone :)


----- Original Message ----
From: Jayaram Subramanian <rsjay1976@gmail.com>
To: derby-dev@db.apache.org
Sent: Fri, 9 April, 2010 2:56:53
Subject: Re: DRDA - An abridged tutorial

Thanks Tiago. Also special thanks to your derby installation video,
thrpugh which i installed derby in my pc


On Thu, Apr 8, 2010 at 2:49 PM, Tiago Espinha <tiago.derby@yahoo.co.uk> wrote:
> Dear all,
> Earlier today Kathey gave me a crash course on DRDA for my GSoC project. Jayaram asked
Kathey for a transcript of our conversation and Kathey suggested that I'd send it to the list,
so that other contributors could spot eventual mistakes. If you find any, please feel free
to chip in with your knowledge.
> Here goes:
> ---------------------8<----------------------
> <kmarsden> DRDA - Distributed Relational Database Architecture.
> <kmarsden> Basically it is a protocol that shuttles database requests from a client
over the network to a server.  The call the client an Application Requester and the server
an Application Server.
> <kmarsden> The Application Server term predates what we think of as an Application
Server and has nothing to do with it.
> <etiago> ok
> <kmarsden> So as we discussed, the derby client JDBC Driver (our Application Requester)
converts JDBC calls into DRDA, sends the DRDA accross the network to Network Server (our Application
Server) which converts them back into JDBC which it sends to the embedded driver.
> <kmarsden> So the whole thing is a JDBC to DRDA to JDBC converter.  This way we
meet the requirement of having multiple jvms on multiple machines accessing a single embedded
database, because everything gets routed through the network server process.
> <kmarsden> Make sense?
> <etiago> it does
> <kmarsden> DRDA is mostly associated historically with DB2 but there are actually
a lot of licensed DRDA vendors, including Microsoft
> <etiago> so let me try to establish a comparison here
> <etiago> DRDA is sort of a platform-agnostic way of transferring database requests
over the network, is this right?
> <etiago> something like XML
> <kmarsden> yes. That's right.   It can go over TCP/IP or something called SNA,
but yes it is platform-agnostic. As you may guess I think it started on the mainframe with
EBCDIC encoding.
> <kmarsden> Having a standard protocol has allowed even for a single client to be
shared amongst several databases.
> <kmarsden> For example the IBM DB2 Universal JDBC Driver (JCC) used to be used
with Derby and is still used with Informix as well as of course many flavors of DB2.
> <kmarsden> DRDA iis also used with the DB2 C client. Not just JDBC
> <etiago> I see
> <kmarsden> In practice it is not as portable as you would like. It is generally
a lot of work to get one product's DRDA client working with a different server.
> <etiago> ok, is DRDA just plaintext? I was looking at the trace and I saw that
there's an hexadecimal, ASCII and EBCDIC representations of the data
> <kmarsden> The encoding for data can be described in the protocol flow. Up until
now the DDM commands and parameters (which we will discuss in a bit) were all in EBCDIC.
> <etiago> ok
> <kmarsden> UNICODEMGR allows them (except for the earliest) to be in UTF-8
> <kmarsden> So there are three volumes to the manuals. Volume 1 DRDA describes the
protocol flow.
> <kmarsden> It shows the commands and what order they flow in from AR to AS and
> <kmarsden> But to read it you need to look at Volume 2, which describes the commands
and objects that are being flowed in detail
> <kmarsden> The DDM manual shows a 2 byte "Codepoint" for each one of these commands
or objects and that is how they are identified.
> <etiago> ah, kind of counterintuitive then #:)
> <kmarsden> You will see Codepoint.java class in both client and server that lists
the codepoints in Derby
> <etiago> I did look at that and all it reminded me of was pointers
> <kmarsden> Vol 3 describes the format for the data types.
> <etiago> so DRDA also has data types of its own?
> <kmarsden> I can't say I have ever actually used Vol 3 or understood it.  That
work was done pretty early before even my time.  I spend my time mostly in Vol 2 and Vol 1
> <etiago> oh alright
> <kmarsden> Yes, it does and I think at two levels.  For example DRDA itself in
Volume 1 will talk about Varchar and Char types but Vol 3 just talks about strings in general.
 But we'll move on because this is not where you will be working.
> <etiago> ok
> * kasun (~kasun@ entrou em #derby
> * kasun agora chama-se Guest50421
> <kmarsden> Ok.  All this protocol flow and DDM objects are wrapped in something
called a Data Stream Structure (DSS) which is just really a packet wrapper and has no real
content except descxribing the length and chaniing one DSS to another.  You will see these
in the traces starting with a D0 and identifes the packet as a request a reply or an object.
> <kmarsden> I only mention that really because you will see DSS referred to all
over the place. I don't think you will have to change that code either.
> <etiago> yes, I've seen DSS stuff around
> <kmarsden> Now lets look at the actual protocol.  When a connection is made by
the AR the first thing that is sent is an EXCSAT
> <kmarsden> Exchange Server Attributes Request
> <etiago> yes, I see that on the trace
> <kmarsden> This even with UNICODEMGR will be sent EBCDIC and is really important
because it is where the protocol levels and now encoding are negotiated.
> <etiago> as I understand, we keep this as EBCDIC to allow clients that don't support
UTF-8 to ask for normal EBCDIC rather than UTF-8
> <kmarsden> Right. If you look at the DDM manual which is alphabetical at EXCSAT,
you will find out about it.,
> <kmarsden>
> <etiago> ok
> <kmarsden> Generally with EXCSAT you send an external name identifying the client,
a version and a manager level list.
> <kmarsden> It is the manager level list or MGRLVLLS where the protocol negotiation
and now encoding takes place.
> <kmarsden> oh no, I switched computers and don't have my spec to look at so I wlll
go from memory.
> <kmarsden> Most of the Manager levels like SQLAM for instance are just a number
1- whatever to show the protocol level, but UNICODEMGR the one being introduced for ACR7007
is a bit different.
> <kmarsden> For that one the client will send 0 if it wants to continue with EBCDIC
or 1208 if it wants UTF-8.  I am not sure exactly why it was set up this way, maybe to allow
different encodings in the future? I just don't know
> <etiago> ok yeah, that sort of makes sense
> <kmarsden> The next things that flow from the AR to the AS are ACCEC and SECCHK
to negotiate security. These are chained before receiving the EXCSATRM so these are in EBCDIC
> <kmarsden> SECCHK (or maybe ACCSEC) has an optional RDBNAM parameter which Derby
client used to send.
> <etiago> that's the database name if I recall correctly
> <kmarsden> Some time ago I took that out in preparation of DERBY-728. We now do
not send the RDBNAM until it is required in ACCRDB
> <kmarsden> right.
> <etiago> oh by removing it from the SECCHK (which is EBCDIC-only) we can actually
support database names with chinese and japanese characters
> <kmarsden> Anyway after the AR flows EXCSAT, ACCSEC and SECCHK then the server
sends back an EXCSATRM with it's manager level list and then the AR and AS settle on the level
> <kmarsden> right.
> <etiago> ok what exactly is this level that they settle on?
> <kmarsden> For each level, for instance UNICODEMGR, if the AR sent 1208 and the
AS sent back 0, they would settle on 0 all EBCDIC.
> <kmarsden> I suppose it is actually the server that takes what the client has provided
and returns the actual level that it will be.
> <etiago> oh ok
> <kmarsden> If the server could handle 1208 but the client said 0, it would be 0.
> <kmarsden> At least I think that's how it works. You better check
> <etiago> that makes sense but I will check
> <kmarsden> Also if the client knows nothing about this new manager level the server
will dumb things down,
> <kmarsden> For example if a 10.3 client tries to connect to a 10.6 server after
DERBY-728 has been implemented, EBCDIC it will be, UNICODEMGR level 0
> <kmarsden> make sense?
> <etiago> yep we have to maintain compatibility
> <etiago> by using what's common to everyone
> <kmarsden> right
> <kmarsden> So let's say you have implemented DERBY-728 and are using new client
and server.   AR sent 1208, AS responded 1208 for UNICODEMGR.   It is with the next request
ACCRDB that the encoding changes.
> <kmarsden> It is with this request that we now finally send the RDBNAM in UTF-8
so we can send Chinese. Hooray!
> <etiago> yay :-)
> <kmarsden> But there is gotcha of course.  The DRDA character strings have limits
and not just character limits but byte limits.
> <etiago> meaning that the arguments of ACCRDB, etc have certain byte length limitations,
> <kmarsden> right. It is really very sad because from a Derby perspective, these
are just arbitrary limits and since the database name is a often a full file system path and
now characters can take up to four bytes I am thinking folks are going to run out.
> <etiago> yeah, I suppose these limits are defined somewhere in those DRDA documents?
> <kmarsden> but I understand extending or eliminating 1) would require a new opengroup
ACXR which seems to take years and 2) will be a total no go in the C world.
> <kmarsden> right.  I think RDBNAM is 255 bytes
> <etiago> actually, about that, is this ACR actually approved?
> <kmarsden> It is not.  It is undergoing the "fast track" approval process at opengroup
with some other ACR's.  Work on it within IBM completed over a year ago, but they were waiting
for a bunch of others to be ready to submit
> <etiago> gotta love bureaucracy :-)
> <kmarsden> But I think it is pretty solid.  The work you do should be for 10.7,
not 10.6
> <kmarsden> so we'll be safe just in case.
> <etiago> alright
> <kmarsden> I should mention that this ACR was not originally for the purpose we
are using it for.  It was originally meant  some sort of performance improvement to avoid
converting to EBCDIC all the time.
> <kmarsden> At least 6 times that I can remember someone suggested moving the switch
to UTF-8 later after ACCRDB, so I have scouts at opengroup that promise they will send up
a big red flag if somebody tries i again.
> <etiago> hahaha
> <kmarsden> anyway moving along.
> <etiago> that's good
> <kmarsden> So what I have covered is the part of the protocol that you will be
dealing with mostly. I will move on with some other protocol stuff but mostly for general
knowledge. Then some other time we can have a talk about the Derby code in relation to the
> <etiago> ok sounds good
> <kmarsden> So when Network server receives the ACCRDB it makes an embedded connection
to the database which it will use for all the requests on that client connection.  The simplest
thing that might come through is an EXCSQLIMM which is just a simple statement execution with
no result set and not a prepared statement, maybe just an update or delete.
> <etiago> yep I got that in my trace: [derby]        SEND BUFFER: EXCSQLIMM    
         (ASCII)           (EBCDIC)
> <kmarsden> Most statements need to be prepared. Even if they are not a JDBC prepared
statement, if they return a result set, they have to be prepared with a PRPSQLSTT
> <kmarsden> The PRPSQLSTT command takes a package name, consistency token and section
number.  The package name and section number are used to identify the statement later when
it is executed.
> <kmarsden> In DB2 these are actual things and there are packages related to holdability
and other things I can't remember and the section is the statement within that category.
> <etiago> hmm ok
> <kmarsden> In Derby, we just sort of pretend, because we don't actually have packages.
 But these are how the statements are identified and Network Server keeps a hash table of
the prepared statements keyed on package and section number for retrieval.
> <kmarsden> An interesting way to look at this is to start network server and then
go into ij and prepare a statement or two  then  run runtimeinfo to see the statements, their
packages and section numbers.
> <etiago> ok
> <kmarsden> After PRPSQLSTT is the SQLSTT which is the actual SQL statement text.
This I believe can already be encoded UTF-8 even without ACR7007, so you can have international
characters in SQL even now.
> <kmarsden> Once the statement is prepared, you execute it with ECSQLSTT.  Of course
when you do this, you send again the package name, consitency token and section number so
the server can look it up and match it to the prepared statement
> <etiago> hmm but is it actually done right now in Derby?
> <kmarsden> SQLDTA contains the parameter data and may just have place holders for
large objects which are sent in an EXTDTA
> <etiago> ok
> <kmarsden> If the statement is a query, the AR sends an OPNQRY, and gets an OPNQRYRM
and then iterates with CNTQRY (sort of like next but for many rows) and gets QRYDTA and possibly
EXTDTA objects back with the data.
> <kmarsden> En the end the server sends a ENDQRYRM and a SQLCARD indicating the
end of data.
> <kmarsden> The SQLCARD is an interesting thing. Generally it is used for SQLExceptions,
but there is a special one for end of data.
> <etiago> so a simple SELECT statement actually represents several DRDA commands,
> <kmarsden> right, but not nearly as many as in that book.  I still don't understand
why it's so big. The ones I mentioned here are pretty much the pages you will go to over and
over again
> <etiago> hehe fair enough :)
> <kmarsden> When the client connection is all done, there is no final end connection
command that it sends, which really irritating. The socket is closed on the client and the
server has to detect that and clean up. This has lead to many leak issues over the years.
Maybe someday there will be an actual command which will make things easier.
> <etiago> maybe another ACR is due
> <kmarsden> So I guess that's the end of the protocol story for today, unless you
have questions.
> <kmarsden> I wonder too if dag is here and  if so how many lies he can spot.
> <etiago> haha I think that covers it for now, I will have questions over time I'm
> <kmarsden> Yes, maybe you can write the ACR's for extending the character string
lengths and a connection termination command.  It will look nice on your resume and nice for
opengroup to have input from a more diverse group.
> <etiago> maybe #:)
> <etiago> maybe I still have a too simplistic insight of DRDA on Derby, but I think
this will all boil down to detect the level of the encoding and instantiate the right encoder
class based on that
> <kmarsden> I am a bit torn sometimes about whether we should just branch out since
we have our own ciient.
> <kmarsden> Yes, that sounds right.
> ------------------------------------8<----------------------
> Regards,
> Tiago Espinha


View raw message