db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Knut Anders Hatlen (JIRA)" <j...@apache.org>
Subject [jira] Commented: (DERBY-4357) TableFunctions provide no information to limit underlying query
Date Wed, 02 Sep 2009 11:44:32 GMT

    [ https://issues.apache.org/jira/browse/DERBY-4357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12750405#action_12750405
] 

Knut Anders Hatlen commented on DERBY-4357:
-------------------------------------------

Thanks for the clarifications, Rick.

I agree that if we add a mechanism to let the table function return the results in a specific
order, it would be pointless to do an extra sort in Derby. I would argue though that there
is a difference between pushing restrictions and ensuring ordering.

With the ordering, the table function itself declares that it can return the rows ordered
by column X, so it's reasonable to expect that it's true that they will in fact be ordered
by X. Also, Derby won't ever ask it to order by column Y. So for column X, where you have
an easy way to do the ordering inside the table function, you can use the power of Java to
order the rows, whereas for column Y you rely on the power of SQL, so you get to combine the
strengths of both Java and SQL.

With the restrictions, the table function has no way to say that it only accepts restrictions
on a certain column, or only certain kinds of operators. So if the table function is required
to enforce all restrictions, you can no longer use the power of SQL for those restrictions
where you have no advantages of implementing them in Java. (Also, though perhaps not likely
to ever be a problem, new operators cannot be added later without breaking the existing restricted
VTIs, as old table functions wouldn't know how to handle them, but they are not allowed to
ignore them.)

As an example, say that we have a table function F(K,V) which is simply wrapping a hash table.
A restriction such as K='abc' would be very helpful to push down, as it could be used as an
argument to Hashtable.get() and reduce the number of rows returned to one (or zero). However,
restrictions on K with operators such as <,>,<=,>=, or any restriction on V, would
require that the entire Hashtable was scanned. One would also have to implement evaluation
of each operator on each column, and one would have to handle arbitrarily nested AND/OR operators.
Whereas this is indeed possible to do, it sounds like wasted work to reimplement what's already
implemented in the SQL engine for no added benefit.

If the restrictions are only seen as a hint, this table function would need only a couple
of lines of code to extract the equals restrictions on K. The full table scan needed for more
complex restrictions could be left to the SQL engine, which should be able to perform the
scan just as efficiently as the user's hand-coded scan. Also, the hand-coded scan will probably
have undergone less testing and will be more error-prone than the code in the SQL engine.

> TableFunctions provide no information to limit underlying query
> ---------------------------------------------------------------
>
>                 Key: DERBY-4357
>                 URL: https://issues.apache.org/jira/browse/DERBY-4357
>             Project: Derby
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 10.4.1.3, 10.4.2.0, 10.5.1.1, 10.5.2.0, 10.5.3.0
>         Environment: ALL
>            Reporter: Chris Goodacre
>         Attachments: RestrictedTableFunctions.html
>
>
> The API specification for TableFunctions cannot provide information to the implementer
of the TableFunction about the details of the query.  For example: 
> (a) I defined a table function named MyFunction with columns a,b, & c
> (b) I bind the table function properly using the CREATE FUNCTION SQL.
> User executes the following SQL:
> select a,b from table ( MyFunction() ) where c = 123
> Without passing the column list and/or where clause as arguments to the table function,
my implementation can not know that it only needs two of the three columns, and only rows
where c = 123.
> For TableFunctions that are built to integrate distant/legacy data, the cost of the query
can be prohibitive.   It would be better if information regarding the columns in the select
and restrictions from the where clause could be passed to the developer.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message