db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rick Hillegas (JIRA)" <j...@apache.org>
Subject [jira] Commented: (DERBY-4357) TableFunctions provide no information to limit underlying query
Date Tue, 01 Sep 2009 15:44:32 GMT

    [ https://issues.apache.org/jira/browse/DERBY-4357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12749952#action_12749952

Rick Hillegas commented on DERBY-4357:

Thanks for the quick feedback, Knut.

>1) It wasn't entirely clear to me how the _nullEqualsNull field in Restriction.ColumnQualifier
is supposed to be used. Is the user supposed to specify somehow that this VTI treats NULLs
in a non-standard way? If so, how (and why)? Or is it meant to implement IS (NOT) NULL restrictions?
Or something else?

It is meant to implement IS NULL. I will clarify this.

>2) The operator '<>' is not defined. Is that intentional?

Yes. The idea is to model the Qualifier functionality which the optimizer already understands.
The comparisons are the same ones found in org.apache.derby.iapi.types.Orderable.

>3) The array passed to setMaterializedColumnNames() contains somewhat redundant information
(both names and positions of the columns to materialize). Would an array of booleans suffice?
Or could the extra information be used for something?

I agree that the information is redundant and my original suggestion was to use a bitmap.
However, Chris reported that this would be cumbersome to use in the real world. I don't think
that the redundancy is harmful. In the meantime, I have warmed up to the redundancy. That's
because I can see that it will make it very easy to write a generic table function that wraps
a SELECT against a foreign table and allows you to push projections and restrictions into
the foreign query.

4) I cannot find that the spec says what happens if the restricted VTI returns non-qualifying
rows. Should the restrictions be reevaluated outside the VTI? (I think I would prefer that
the restrictions were only seen as hints, so that the VTI doesn't have to check all of them.
Then, if there's a restriction on a column that doesn't really help the VTI retrieving the
rows more efficiently, it could just disregard that restriction and let the SQL do the filtering

The intent is that the table function has signed up for a contract to filter rows. If the
table function doesn't fulfill its contract, then the table function has a bug. I will clarify
this in the spec. I do see your point that redundant enforcement of the qualification in Derby
could be useful to some users. I'm prepared to revisit this part of the spec during implementation
if it turns out that the query processor already redundantly enforces restrictions even after
pushing them into the store layer.

The reason that I prefer to regard the filtering as a contract is that I think it will make
it easier to reason about these smart table functions when we make them even smarter: if a
table function claims that it can perform a piece of processing, then Derby should be able
to rely on that claim. I'm thinking that at some point we will want to add a similar mechanism
so that a table function can declare that it can return rows in certain sort orders. I think
that if Derby asks a table function to sort the rows, then Derby doesn't want to perform a
redundant sort outside the table function.


> TableFunctions provide no information to limit underlying query
> ---------------------------------------------------------------
>                 Key: DERBY-4357
>                 URL: https://issues.apache.org/jira/browse/DERBY-4357
>             Project: Derby
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions:,,,,
>         Environment: ALL
>            Reporter: Chris Goodacre
>         Attachments: RestrictedTableFunctions.html
> The API specification for TableFunctions cannot provide information to the implementer
of the TableFunction about the details of the query.  For example: 
> (a) I defined a table function named MyFunction with columns a,b, & c
> (b) I bind the table function properly using the CREATE FUNCTION SQL.
> User executes the following SQL:
> select a,b from table ( MyFunction() ) where c = 123
> Without passing the column list and/or where clause as arguments to the table function,
my implementation can not know that it only needs two of the three columns, and only rows
where c = 123.
> For TableFunctions that are built to integrate distant/legacy data, the cost of the query
can be prohibitive.   It would be better if information regarding the columns in the select
and restrictions from the where clause could be passed to the developer.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message