db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bryan Pendleton (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (DERBY-6863) NPE when multiple values are contained in an IN statement within a CASE statement used in a GROUP BY
Date Sun, 21 Feb 2016 18:01:18 GMT

    [ https://issues.apache.org/jira/browse/DERBY-6863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15156121#comment-15156121
] 

Bryan Pendleton edited comment on DERBY-6863 at 2/21/16 6:00 PM:
-----------------------------------------------------------------

I tried various perturbations of the failing query and succeeded in moving the
NPE around from place to place, but did not come up with a formulation that
succeeds. 

For example, the NPE occurrs whether we SELECT or SELECT DISTINCT,
although the precise location of the NPE shifts about somewhat.

I also spent some time looking at the exact stack trace of the crash in more detail.

Since the query is a GROUP BY, and, moreover, is a DISTINCT, there are sorts
involved. We have to sort the data on the result of the CASE expression in
order to compute the GROUP BY aggregation on SUM(COST), and then we
later have to sort the grouped-and-aggregated data on all the columns in order
to remove any duplicates from the DISTINCT.

With sufficient tracing, I can see that the raw underlying table data is successfully
read from the base table, and that the generated code for the CASE expression
processes those rows correctly, turning 'Asia' and 'Australia' into 'A', while leaving
the other region values alone.

And then I can see that, once the original scan and sort of the base data completes,
we close the base table result set and clear it from the BaseActivation object; the
stack when this occurs looks like:

    [junit]     at org.apache.derby.impl.sql.execute.BaseActivation.clearCurrentRow(BaseActivation.java:1390)
    [junit]     at org.apache.derby.impl.sql.execute.NoPutResultSetImpl.clearCurrentRow(NoPutResultSetImpl.java:354)
    [junit]     at org.apache.derby.impl.sql.execute.TableScanResultSet.close(TableScanResultSet.java:585)
    [junit]     at org.apache.derby.impl.sql.execute.BulkTableScanResultSet.close(BulkTableScanResultSet.java:444)
    [junit]     at org.apache.derby.impl.sql.execute.ProjectRestrictResultSet.close(ProjectRestrictResultSet.java:402)
    [junit]     at org.apache.derby.impl.sql.execute.GroupedAggregateResultSet.loadSorter(GroupedAggregateResultSet.java:316)
    [junit]     at org.apache.derby.impl.sql.execute.GroupedAggregateResultSet.openCore(GroupedAggregateResultSet.java:224)

But then, a bit later in the query, the higher-level sort operation (used for the DISTINCT)
causes us to need to re-execute the computed expression on the query results:

        at org.apache.derby.exe.ac5a9fc15bx0153x0381x5b15xffffd9ce61af1.e1(Unknown Source)
        at org.apache.derby.impl.services.reflect.DirectCall.invoke(ReflectGeneratedClass.java:103)
        at org.apache.derby.impl.sql.execute.ProjectRestrictResultSet.doProjection(ProjectRestrictResultSet.java:531)
        at org.apache.derby.impl.sql.execute.ProjectRestrictResultSet.getNextRowCore(ProjectRestrictResultSet.java:325)
        at org.apache.derby.impl.sql.execute.SortResultSet.getRowFromResultSet(SortResultSet.java:574)
        at org.apache.derby.impl.sql.execute.SortResultSet.getNextRowFromRS(SortResultSet.java:562)
        at org.apache.derby.impl.sql.execute.SortResultSet.loadSorter(SortResultSet.java:311)
        at org.apache.derby.impl.sql.execute.SortResultSet.openCore(SortResultSet.java:258)
        at org.apache.derby.impl.sql.execute.BasicNoPutResultSetImpl.open(BasicNoPutResultSetImpl.java:266)

And, at that point, we can no longer execute the CASE expression on the data, because
the underlying result set has been closed, and so we get the NPE.

It seems to me that the flaw lies in how we are manipulating the results of 
the GROUP BY, later in the query.

Once the GROUP BY has been computed, by the GroupedAggregateResultSet,
we no longer have "raw" table data anymore; we have a processed, derived,
aggregated data set, and hence when we re-process that result set (to re-sort
it in the case of DISTINCT, or simply to display it if the DISTINCT is not present),
we should not be attempting to re-compute the expressions in the data; we
should just be using the columns of the data as is.

But the way the code appears to be executing, it seems that we are trying
to compute the CASE expression twice:
- firstly, we compute it during the GROUP BY  processing, which seems correct
- but then later we try to re-compute it during further processing of the GROUP BY
  results, which seems wrong.

Perhaps this is related to the fundamental instabilities of the DERBY-883
features, as Knut Anders suggested earlier.

I'm not sure how much farther I can get with this right now; I suspect that to
really comprehend the behavior of this query I'm going to have to dig deep
into the original DERBY-883 implementation, which may take a while.

But maybe these observations are useful for others who are interested in this crash.



was (Author: bryanpendleton):
I tried various perturbations of the failing query and succeeded in moving the
NPE around from place to place, but did not come up with a formulation that
succeeds. 

For example, the NPE occurrs whether we SELECT or SELECT DISTINCT,
although the precise location of the NPE shifts about somewhat.

I also spent some time looking at the exact stack trace of the crash in more detail.

Since the query is a GROUP BY, and, moreover, is a DISTINCT, there are sorts
involved. We have to sort the data on the result of the CASE expression in
order to compute the GROUP BY aggregation on SUM(COST), and then we
later have to sort the grouped-and-aggregated data on all the columns in order
to remove any duplicates from the DISTINCT.

With sufficient tracing, I can see that the raw underlying table data is successfully
read from the base table, and that the generated code for the CASE expression
processes those rows correctly, turning 'Asia' and 'Australia' into 'A', while leaving
the other region values alone.

And then I can see that, once the original scan and sort of the base data completes,
we close the base table result set and clear it from the BaseActivation object; the
stack when this occurs looks like:

    [junit]     at org.apache.derby.impl.sql.execute.BaseActivation.clearCurrentRow(BaseActivation.java:1390)
    [junit]     at org.apache.derby.impl.sql.execute.NoPutResultSetImpl.clearCurrentRow(NoPutResultSetImpl.java:354)
    [junit]     at org.apache.derby.impl.sql.execute.TableScanResultSet.close(TableScanResultSet.java:585)
    [junit]     at org.apache.derby.impl.sql.execute.BulkTableScanResultSet.close(BulkTableScanResultSet.java:444)
    [junit]     at org.apache.derby.impl.sql.execute.ProjectRestrictResultSet.close(ProjectRestrictResultSet.java:402)
    [junit]     at org.apache.derby.impl.sql.execute.GroupedAggregateResultSet.loadSorter(GroupedAggregateResultSet.java:316)
    [junit]     at org.apache.derby.impl.sql.execute.GroupedAggregateResultSet.openCore(GroupedAggregateResultSet.java:224)

But then, a bit later in the query, the higher-level sort operation (used for the DISTINCT)
causes us to need to re-execute the computed expression on the query results:

    [junit]     at org.apache.derby.impl.sql.execute.BaseActivation.clearCurrentRow(BaseActivation.java:1390)
    [junit]     at org.apache.derby.impl.sql.execute.NoPutResultSetImpl.clearCurrentRow(NoPutResultSetImpl.java:354)
    [junit]     at org.apache.derby.impl.sql.execute.TableScanResultSet.close(TableScanResultSet.java:585)
    [junit]     at org.apache.derby.impl.sql.execute.BulkTableScanResultSet.close(BulkTableScanResultSet.java:444)
    [junit]     at org.apache.derby.impl.sql.execute.ProjectRestrictResultSet.close(ProjectRestrictResultSet.java:402)
    [junit]     at org.apache.derby.impl.sql.execute.GroupedAggregateResultSet.loadSorter(GroupedAggregateResultSet.java:316)
    [junit]     at org.apache.derby.impl.sql.execute.GroupedAggregateResultSet.openCore(GroupedAggregateResultSet.java:224)

And, at that point, we can no longer execute the CASE expression on the data, because
the underlying result set has been closed, and so we get the NPE.

It seems to me that the flaw lies in how we are manipulating the results of 
the GROUP BY, later in the query.

Once the GROUP BY has been computed, by the GroupedAggregateResultSet,
we no longer have "raw" table data anymore; we have a processed, derived,
aggregated data set, and hence when we re-process that result set (to re-sort
it in the case of DISTINCT, or simply to display it if the DISTINCT is not present),
we should not be attempting to re-compute the expressions in the data; we
should just be using the columns of the data as is.

But the way the code appears to be executing, it seems that we are trying
to compute the CASE expression twice:
- firstly, we compute it during the GROUP BY  processing, which seems correct
- but then later we try to re-compute it during further processing of the GROUP BY
  results, which seems wrong.

Perhaps this is related to the fundamental instabilities of the DERBY-883
features, as Knut Anders suggested earlier.

I'm not sure how much farther I can get with this right now; I suspect that to
really comprehend the behavior of this query I'm going to have to dig deep
into the original DERBY-883 implementation, which may take a while.

But maybe these observations are useful for others who are interested in this crash.


> NPE when multiple values are contained in an IN statement within a CASE statement used
in a GROUP BY
> ----------------------------------------------------------------------------------------------------
>
>                 Key: DERBY-6863
>                 URL: https://issues.apache.org/jira/browse/DERBY-6863
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.12.1.1
>         Environment: Java 8, Mint Linux, 24G, Used within an in-memory table
>            Reporter: Peter Damen
>         Attachments: ac_86.decomp, derby.log, tests.diff
>
>
> An IN statement within a CASE statement within a GROUP BY, that contains more than one
element will cause a NPE.
> Reproduction Steps:
> ======== DATA ============
> CREATE TABLE Test (
>        Region VARCHAR(20),
>        Cost INTEGER
> );
> INSERT INTO Test VALUES ('Australia', 45);
> INSERT INTO Test VALUES ('Asia', 22);
> INSERT INTO Test VALUES ('North America', 33);
> INSERT INTO Test VALUES ('South America', 55);
> INSERT INTO Test VALUES ('Europe', 44);
> ======= SQL WORKS ===========
> SELECT DISTINCT
>    CASE
>       WHEN 1 = 0 THEN "REGION"
>       WHEN "REGION" IN ('Asia') THEN 'A'
>       ELSE "REGION"
>    END,
>    SUM("COST")
> FROM TEST
> GROUP BY 
>    CASE
>       WHEN 1 = 0 THEN "REGION"
>       WHEN "REGION" IN ('Asia') THEN 'A'
>       ELSE "REGION"
>    END
> ============ FAILS ===================
>    SELECT DISTINCT
>       CASE
>          WHEN 1 = 0 THEN "REGION"
>          WHEN "REGION" IN ('Asia', 'Australia') THEN 'A'
>          ELSE "REGION"
>       END,
>       SUM("COST")
>    FROM TEST
>    GROUP BY 
>       CASE
>          WHEN 1 = 0 THEN "REGION"
>          WHEN "REGION" IN ('Asia','Australia') THEN 'A'
>          ELSE "REGION"
>       END
> ================== NPE ===================
> java.sql.SQLException: Java exception: ': java.lang.NullPointerException'.
> 	at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source)
> 	at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source)
> 	at org.apache.derby.impl.jdbc.Util.seeNextException(Unknown Source)
> 	at org.apache.derby.impl.jdbc.Util.javaException(Unknown Source)
> 	at org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(Unknown Source)
> 	at org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(Unknown Source)
> 	at org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(Unknown Source)
> 	at org.apache.derby.impl.jdbc.EmbedConnection.handleException(Unknown Source)
> 	at org.apache.derby.impl.jdbc.ConnectionChild.handleException(Unknown Source)
> 	at org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(Unknown Source)
> 	at org.apache.derby.impl.jdbc.EmbedStatement.execute(Unknown Source)
> 	at org.apache.derby.impl.jdbc.EmbedStatement.execute(Unknown Source)
> 	at sun.reflect.GeneratedMethodAccessor47.invoke(Unknown Source)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:622)
> 	at com.onseven.dbvis.b.B.B.ā(Z:2256)
> 	at com.onseven.dbvis.b.B.F$A.call(Z:2838)
> 	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> 	at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> 	at java.lang.Thread.run(Thread.java:701)
> Caused by: ERROR XJ001: Java exception: ': java.lang.NullPointerException'.
> 	at org.apache.derby.iapi.error.StandardException.newException(Unknown Source)
> 	at org.apache.derby.impl.jdbc.SQLExceptionFactory.wrapArgsForTransportAcrossDRDA(Unknown
Source)
> 	... 22 more
> Caused by: java.lang.NullPointerException
> 	at org.apache.derby.exe.acd8cd40bax0152xa074xeeddx0000012ba0d05.e1(Unknown Source)
> 	at org.apache.derby.impl.services.reflect.DirectCall.invoke(Unknown Source)
> 	at org.apache.derby.impl.sql.execute.ProjectRestrictResultSet.doProjection(Unknown Source)
> 	at org.apache.derby.impl.sql.execute.ProjectRestrictResultSet.getNextRowCore(Unknown
Source)
> 	at org.apache.derby.impl.sql.execute.SortResultSet.getRowFromResultSet(Unknown Source)
> 	at org.apache.derby.impl.sql.execute.SortResultSet.getNextRowFromRS(Unknown Source)
> 	at org.apache.derby.impl.sql.execute.SortResultSet.loadSorter(Unknown Source)
> 	at org.apache.derby.impl.sql.execute.SortResultSet.openCore(Unknown Source)
> 	at org.apache.derby.impl.sql.execute.BasicNoPutResultSetImpl.open(Unknown Source)
> 	at org.apache.derby.impl.sql.GenericPreparedStatement.executeStmt(Unknown Source)
> 	at org.apache.derby.impl.sql.GenericPreparedStatement.execute(Unknown Source)
> 	... 13 more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message