Return-Path: Delivered-To: apmail-db-derby-user-archive@www.apache.org Received: (qmail 47316 invoked from network); 20 Aug 2009 22:06:54 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 20 Aug 2009 22:06:54 -0000 Received: (qmail 96093 invoked by uid 500); 20 Aug 2009 22:07:12 -0000 Delivered-To: apmail-db-derby-user-archive@db.apache.org Received: (qmail 96040 invoked by uid 500); 20 Aug 2009 22:07:12 -0000 Mailing-List: contact derby-user-help@db.apache.org; run by ezmlm Precedence: bulk list-help: list-unsubscribe: List-Post: List-Id: Reply-To: "Derby Discussion" Delivered-To: mailing list derby-user@db.apache.org Received: (qmail 96017 invoked by uid 99); 20 Aug 2009 22:07:12 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 20 Aug 2009 22:07:12 +0000 X-ASF-Spam-Status: No, hits=1.0 required=10.0 tests=SPF_PASS,SUBJ_RE_NUM X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [68.142.207.194] (HELO web32401.mail.mud.yahoo.com) (68.142.207.194) by apache.org (qpsmtpd/0.29) with SMTP; Thu, 20 Aug 2009 22:07:01 +0000 Received: (qmail 42809 invoked by uid 60001); 20 Aug 2009 22:06:40 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1250805999; bh=v1GtF0RErmTzYDbtj14KkP/p1MWBdLGn3hXCqrXobe4=; h=Message-ID:X-YMail-OSG:Received:X-Mailer:References:Date:From:Subject:To:In-Reply-To:MIME-Version:Content-Type; b=B9l/P7onzqHJ8hYE83bzMFeM3eupuE1+eTWI4yJhALUS+sF4DxqLV499+sVBqqq97m/3aMnczp6COMoLJDm2voe699s+zNBOd7h42QaweZ8ToN/gB+2RG4RIvOZmBhD1VKP7SUfKNtMfE4OFVUHheqMU1l4ioga5F8XTD0DuGxY= DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=Message-ID:X-YMail-OSG:Received:X-Mailer:References:Date:From:Subject:To:In-Reply-To:MIME-Version:Content-Type; b=mqCBqPnqriSx6buzB6/kl33MKUzvtoczWWhT9LHyVxSZ4Q20yJWjEoaaV7nSaLesNAXMkvyNO/uJg5HPi7eoTkWcwmzD9s/0dhrw6AL5xKNjSN2O6TdMYtB7Xij9tK/5klOGRFHIg6CmYYwPtksG0x6PQ2xhbLZAdkGUDaxx4rA=; Message-ID: <925300.42246.qm@web32401.mail.mud.yahoo.com> X-YMail-OSG: TllJj5MVM1lxAaOc_hyi6EbDD461UBFApZC9mI2ifcynf0Ycz6BDSIaxijpbzZN0I4PcjmAVEFFJq2xvKwLLm_dYws9eZMy3qhT__.tJIJwyNOQC67m_HW57wlsQRDRp.VvQrx7uOuNumOzpNktJXG1aJ_oxMqWiDgUPOuDZAdxn4oMSCsLDMKpyEeUHgH4wiggkig92DTtFNV1kcmLTeV_K_4NA7HKe1IpOemR77oP5Rzrc_.H4y8ls44jMuHj6tZ29jMdUEtWLWqfr1N0gYSl4QnFG.dfLSx7NfrWo.s2egFz1RwqDaqPovulbFXf6vrrgwiMxPT40OKVGQehIr_eeLS_.Xya9VJvsow-- Received: from [67.153.8.235] by web32401.mail.mud.yahoo.com via HTTP; Thu, 20 Aug 2009 15:06:39 PDT X-Mailer: YahooMailRC/1358.27 YahooMailWebService/0.7.338.2 References: <827290.59856.qm@web32406.mail.mud.yahoo.com> <4A649375.1030501@sun.com> <956237.88882.qm@web32407.mail.mud.yahoo.com> <4A64CEBF.8080304@sun.com> Date: Thu, 20 Aug 2009 15:06:39 -0700 (PDT) From: Chris Goodacre Subject: Re[2]: Question about TableFunctions in Derby To: Derby Discussion In-Reply-To: <4A64CEBF.8080304@sun.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Virus-Checked: Checked by ClamAV on apache.org Rick, Sorry it's taken me so long to reply on this. I just today got back to this in earnest. I'll try to walk through an example, imagining that I have an array of ScanQualifiers that gets passed to my table function's method, just to make sure I understand this. public static ResultSet read(ScanQualifier[] qualifiers) { // ... impl } So, if I were to go back to my original example: select house_number, street, city from table (legacy_realty_data()) where price < 500000 a) I think that an array with only a single ScanQualifier object would be passed to my read(...) method. b) I can see where the operator for the ScanQualifier object would be some negative number c) The column id would reference the column # (basically) of the price column from the table definition of my CREATE FUNCTION statement. d) The result of getOrderable() on the scanqualifier object would return me a DataValueDescriptor. e) I could interrogate the DataValueDescriptor to get the value (500000) in a type/manner that I could use to pass on to my legacy system I could use this information to restrict the number of rows that come back. That's good. It would still be nice if I could restrict the number of columns I'm requesting up front. It's expensive to go back and forth to this system, so I would rather make one read (all relevant rows, all relevant columns) and take the chance that the user only uses some of the rows from the result set. Would it be possible to use a ScanQualifier (or something like it) to inform the table procedure methods which specific (non-calculated) columns are in the query? -chris ----- Original Message ---- From: Rick Hillegas To: Derby Discussion Sent: Monday, July 20, 2009 3:08:31 PM Subject: Re: Question about TableFunctions in Derby Hi Chris, Reducing the number of column probes may be possible without any changes to Derby: When your ResultSet is asked to get a column, it can remember that request. On later rows, your ResultSet can ask the external data source for all of the column positions it has remembered so far. In the query you gave, this would play out like this: 1) On the first row, your ResultSet would make 3 calls to the external data source, one for each column. But the ResultSet would remember which columns were requested. 2) For each of the remaining N-1 rows, your ResultSet would call the external data source only once, asking the external data source for all three columns in a single batch. That batch could then be cached and the individual columns could be returned to Derby when Derby called the getXXX() methods. Positioning and restricting the rows themselves (the WHERE clause fragments) is tricker. It probably requires help from Derby, as you suggest. We could design some interface by which Derby would pass the ResultSet a list of org.apache.derby.iapi.sql.execute.ScanQualifier. Your ResultSet could then forward those directives to the external data source. What do you think? -Rick Chris Goodacre wrote: > Rick, thanks for your suggestions. Perhaps I am being obtuse, but when you say ... > > "Since you have only asked for 3 columns, that's all that Derby will > request from the ResultSet instantiated by your table function. That > is, Derby is only going to call ResultSet.getXXX() on the house_number, > street, and city columns. That should behave efficiently provided that > your ResultSet is smart enough to only fault-in columns for which a > getXXX() is called." > > Does that mean that I make a separate request to the legacy system each time getXXX() is called - i.e. lazily initialize each column in the result set? I think this has to be the only way to do it, since I don't know which columns will be requested at the time the read() method of my tablefunction is invoked. > Making (in this case) 3 calls to the legacy system to get 1 column for N rows is certainly better than making 1 call to the legacy system to get 1000 columns for N rows and then throwing away 997*N values/cells, but still not quite as nice as I'd like. > If I were making a wish - I'd wish for some sort of parsed representation of the query get passed to the read method (or to some other method - similar to, or even as part of, the query optimization interface). Ideally, this structured representation would have the list of columns belonging to the table function from the select list, and the where clause components specific to the table function only (i.e. mytablefunction.price > 50000 but NOT mytablefunction.price < myrealtable.value). > > In the absence of this, when the VTIResultSet class passes the ActivationHolder to the derby class which invokes the read() method reflectively, why can't that class pass the activation context (it knows it is dealing with a derby table function, it knows the class name, it has access to the result set descriptor, if not the where clause) pass this information along to the user's table function class? I would happily implement an interface in this class (not sure why read() has to be static) to get this information prior to resultset instantiation. > > -chris > > > > ----- Original Message ---- > From: Rick Hillegas > To: Derby Discussion > Sent: Monday, July 20, 2009 10:55:33 AM > Subject: Re: Question about TableFunctions in Derby > > Hi Chris, > > Some comments inline... > > Chris Goodacre wrote: > >> I've read the Derby developer's guide and Rick Hillegas's informative white paper (http://developers.sun.com/javadb/reference/whitepapers/sampleTableFunctions/doc/TableFunctionsWhitePaper.html) on Table Functions, but am still struggling with the following issue: >> >> I am trying to create an RDB abstraction for a large CICS/VSAM-based legacy system and blend it with our newer, RDB-based tier. This seems like a good application of TableFunctions. The VSAM data is made available to me via an IP-based proprietary messaging interface. There are lots of different files here, but due to some historical forces, most of the data I'm interested in resides in 4 VSAM files. >> >> Unfortunately, each of those VSAM files has over a 1000 fields in it. >> >> Now eventually, it might be possible to fully model a single VSAM file into (for the sake of argument) 50 tables; each table/row representing a small slice of a single VSAM record. >> >> In the meantime, for both this proof-of-concept and as a migration path to our existing clients, I'd like to represent each VSAM file as a table (subject to the 1024 column SQL limitation per table). This will be a highly-denormalized and decidedly non-relational view of the data, but it will be easy to demonstrate and immediately recognizable to our customers. >> >> However, I can't seem to get around the problem of data granularity. For example, if my customer executes: >> >> select house_number, street, city from table (legacy_realty_data()) where price < 500000 >> > Since you have only asked for 3 columns, that's all that Derby will request from the ResultSet instantiated by your table function. That is, Derby is only going to call ResultSet.getXXX() on the house_number, street, and city columns. That should behave efficiently provided that your ResultSet is smart enough to only fault-in columns for which a getXXX() is called. > > The WHERE clause is a little trickier. You are right, Derby will read all rows from the ResultSet and throw away the rows which don't satisfy the WHERE clause. What you want to do is push the qualification through the table function to the external data source. I don't see any way to do this other than adding some more arguments to your table function. For instance, if you could push the qualification through to the external data source, then you could get efficient behavior from something like the following: > > select house_number, street, city > from table( legacy_realty_data( 500000 ) ) s; > > Hope this helps, > -Rick > > >> I don't appear to have any visibility to the actual query inside my legacy_realty_data TableFunction, so I have to go get all 1000 fields for however many listings are present where price< 500000 even though only three columns will be requested. Am I missing something? Aside from having the user repeat the columns as parameters to the table function (which looks awkward to say the least), I can't see a way around this based on my limited knowledge of Derby. >> >> Is there a way to only retrieve the columns that the user is querying for? >> >> Looking forward to your help/advice. >> >> -chris >> >>