Mailing-List: contact dev-help@apr.apache.org; run by ezmlm
Precedence: bulk
Received-SPF: pass (asf.osuosl.org: domain of bojan@rexursive.com designates
 203.171.74.242 as permitted sender)
Message-ID: <20060629090926.zkraogsvogcoo4sg@www.rexursive.com>
Date: Thu, 29 Jun 2006 09:09:26 +1000
From: Bojan Smojver <bojan@rexursive.com>
To: dev@apr.apache.org
Subject: Re: Binary data in apr dbd - where should buckets come from
References: <20060628051313.56816.qmail@web36714.mail.mud.yahoo.com>
In-Reply-To: <20060628051313.56816.qmail@web36714.mail.mud.yahoo.com>
MIME-Version: 1.0
Content-Type: text/plain;
	charset=ISO-8859-1;
	DelSp="Yes";
	format="flowed"
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable
User-Agent: Internet Messaging Program (IMP) H3 (4.1.1)

Quoting Alex Dubov <oakad@yahoo.com>:

> I was working on my long pending changes to
> apr_dbd_mysql when I found that I have no clue how to
> return buckets from apr_dbd_get_entry.
> Some problems:
> 1. Which bucket_alloc should I use and who gets it
> (apr_dbd_get_row or apr_dbd_get_entry)?
>
> 2. There's some gains in having pool passed to
> apr_dbd_get_entry (to defer the check for truncation
> until the value is really needed). In general, it may
> be better to hold a pool passed to apr_dbd_pselect,
> instead of passing a new one to get_row/get_entry.
>
> 3. Alternatively, I can use buckets for everything
> (including strings and simple types). In this case,
> apr_dbd_p(v)select should get bucket_alloc instead.
> apr_dbd_get_entry then may choose to return pointer to
> bucket or pointer to bucket's content.

I would leave all the functions we have now the way they are - they =20
should just take and return const char *, but should encode in ASCII =20
for types like BLOBs (i.e. length:[column:table:]payload business). We =20
already have this interface (i.e. the "string way") and it is handy, =20
so I think we should keep it.

We should introduce new functions for all binary stuff. As I presented =20
in one of my previous e-mails to Chris:

-----------------------------------------------------
As for strings v. natives v. structures (i.e. your point 6), I think we
should handle this by having a whole set of new functions for this
purpose. I would personally keep the existing functions the way they
are, because:

- passing strings in/out for everything is handy
- it ensures backward compatibility

I would enhance this behaviour for existing functions to understand a
few more things, like floats, doubles, longs, shorts, timestamps (all
passed in as strings) and BLOBs (ASCII encoded, as I originally
suggested). This would give us a whole lot of "strings only" stuff to
work with. Not sure if formatting strings in apr_vformatter should
really be related to SQL data type info we are passing in/out here, but
if the list thinks this is the right way to go, I have no problem with
it.

So, now that we have the "strings" API out of the way, I think we should
also introduce a new "binary" API for native data types. First, I would
keep the _prepare identical for both "strings" and "binary" interface.
The formats used should be able to cater for both. In this phase, we
just "hint" what is to be expected, but not "hardcode" anything. Then,
we can have "binary" equivalents of p[v]query/select and get_entry. Here
is how I see them working:

All "simple" types like int, long, float, double etc. are passed in by
pointer only. No need to employ any kind of wrapper structure - lengths
and types are known by the compiler and we can map those directly to SQL
types too. Some other types, like timestamps, dates and times are
probably best passed as their string representations, as this is what
SQL backends can work with, as well as C native APIs, through conversion
functions. BLOBs and such could be passed in through a structure
defining all required elements (length and binary data), including the
infamous column/table info for Oracle. The binary equivalent of
get_entry would then return relevant pointers. The caller already knows
what that is - he/she is the one doing this in the first place, no need
to wrap all this with unneeded info.

Basically, I'm trying to take the shortest path from A to B. If we can
pass native as is, we do. If we can "cheat" by using strings, we do. For
everything else, we do "proper". In other words, if it needs wrapping,
we wrap.

So, the caller can follow one of two paths:

apr_dbd_prepare()
apr_dbd_p[v]query/select() --> takes all args as strings
apr_dbd_get_row()
apr_dbd_get_entry() --> returns all strings

or

apr_dbd_prepare()
apr_dbd_bp[v]query/select() --> takes various pointer args
apr_dbd_get_row()
apr_dbd_get_bentry() --> returns void *

Obviously, we'd have some meaningful function names for all this.
-----------------------------------------------------

I think the first order of business would be to put parsing of SQL =20
queries into the public function (i.e. apr_dbd_prepare()), so that =20
this part is always done exactly the same way for all backends. We =20
could then pass an extra argument to underlying driver function (this =20
wouldn't break binary compatiblity, as those functions "don't exist" =20
from caller's point of view), which would be "pointer to an array of =20
types of parameters" that we parsed, expressed in DBD speak (i.e. we =20
could have an enumerated type for this).

Once the driver functions get this, all they need to do is prepare the =20
statement accordingly (i.e. in the backend specific way).

Then, [b]p[v]select/query functions can fetch arguments either as =20
const char * (the "string way") or as other types of pointers (the =20
"binary way") and use them. Finally, the get_[b]entry can return =20
either const char * (the "string way") or other type of pointer (the =20
"binary way"). And since we know what underlying SQL types map to in C =20
land (because that's what our "binary" interface definition is all =20
about), we don't need any "formatting" for fetching. The caller =20
already knows what's going to come back - he/she defined the SQL =20
columns after all.

Ah yes, the buckets/brigades... I'd use them only when required - for =20
types likes BLOB and maybe binary TEXT, where we may need to get stuff =20
in multiple chunks due to size.

At least that's my take...

--=20
Bojan