db-derby-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rick Hillegas <Richard.Hille...@Sun.COM>
Subject Re: Question about TableFunctions in Derby
Date Mon, 20 Jul 2009 15:55:33 GMT
Hi Chris,

Some comments inline...

Chris Goodacre wrote:
> I've read the Derby developer's guide and Rick Hillegas's informative white paper (http://developers.sun.com/javadb/reference/whitepapers/sampleTableFunctions/doc/TableFunctionsWhitePaper.html)
on Table Functions, but am still struggling with the following issue:
>
> I am trying to create an RDB abstraction for a large CICS/VSAM-based legacy system and
blend it with our newer, RDB-based tier.  This seems like a good application of TableFunctions.
 The VSAM data is made available to me via an IP-based proprietary messaging interface.  There
are lots of different files here, but due to some historical forces, most of the data I'm
interested in resides in 4 VSAM files.
>
> Unfortunately, each of those VSAM files has over a 1000 fields in it.
>
> Now eventually, it might be possible to fully model a single VSAM file into (for the
sake of argument) 50 tables; each table/row representing a small slice of a single VSAM record.
>
> In the meantime, for both this proof-of-concept and as a migration path to our existing
clients, I'd like to represent each VSAM file as a table (subject to the 1024 column SQL limitation
per table).  This will be a highly-denormalized and decidedly non-relational view of the data,
but it will be easy to demonstrate and immediately recognizable to our customers.
>
> However, I can't seem to get around the problem of data granularity.  
>
> For example, if my customer executes:
>
> select house_number, street, city from table (legacy_realty_data()) where price <
500000
>   
Since you have only asked for 3 columns, that's all that Derby will 
request from the ResultSet instantiated by your table function. That is, 
Derby is only going to call ResultSet.getXXX() on the house_number, 
street, and city columns. That should behave efficiently provided that 
your ResultSet is smart enough to only fault-in columns for which a 
getXXX() is called.

The WHERE clause is a little trickier. You are right, Derby will read 
all rows from the ResultSet and throw away the rows which don't satisfy 
the WHERE clause. What you want to do is push the qualification through 
the table function to the external data source. I don't see any way to 
do this other than adding some more arguments to your table function. 
For instance, if you could push the qualification through to the 
external data source, then you could get efficient behavior from 
something like the following:

select house_number, street, city
from table( legacy_realty_data( 500000 ) ) s;

Hope this helps,
-Rick

> I don't appear to have any visibility to the actual query inside my legacy_realty_data
TableFunction, so I have to go get all 1000 fields for however many listings are present where
price< 500000 even though only three columns will be requested.  Am I missing something?
 Aside from having the user repeat the columns as parameters to the table function (which
looks awkward to say the least), I can't see a way around this based on my limited knowledge
of Derby.
>
> Is there a way to only retrieve the columns that the user is querying for?
>
> Looking forward to your help/advice.
>
> -chris
>
>   


Mime
View raw message