phoenix-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hieu Nguyen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PHOENIX-5136) Rows with null values inserted by UPSERT .. ON DUPLICATE KEY UPDATE are included in query results when they shouldn't be
Date Wed, 13 Feb 2019 02:11:00 GMT

    [ https://issues.apache.org/jira/browse/PHOENIX-5136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766662#comment-16766662
] 

Hieu Nguyen commented on PHOENIX-5136:
--------------------------------------

Looking at the underlying HBase data, we noticed that UPSERT .. ON DUPLICATE KEY UPDATE will
actually set an empty value for a column that should contain a Phoenix NULL value:
{noformat}
UPSERT INTO "testView"("groupKey", "colA", "colB") VALUES ('file_a', NULL, 1) ON DUPLICATE
KEY UPDATE "colB" = 2;
1 row affected (0.018 seconds)


hbase(main):016:0> scan 'test'
ROW                                         COLUMN+CELL
 file_a                                     column=0:\x00\x00\x00\x00, timestamp=1550021698069,
value=x
 file_a                                     column=0:\x80\x0B, timestamp=1550021698069, value=
 file_a                                     column=0:\x80\x0C, timestamp=1550021698069, value=\x80\x00\x00\x00\x00\x00\x00\x01
{noformat}

Regular UPSERT will not set any value for that column:
{noformat}
UPSERT INTO "testView"("groupKey", "colA", "colB") VALUES ('file_a', NULL, 1);
1 row affected (0.017 seconds)

ROW                                         COLUMN+CELL
 file_a                                     column=0:\x00\x00\x00\x00, timestamp=1550021776831,
value=x
 file_a                                     column=0:\x80\x0C, timestamp=1550021776831, value=\x80\x00\x00\x00\x00\x00\x00\x01
1 row(s) in 0.0110 seconds
{noformat}

Note that this was done through a view, but we can re-run the experiment directly against
a table if it helps.

> Rows with null values inserted by UPSERT .. ON DUPLICATE KEY UPDATE are included in query
results when they shouldn't be
> ------------------------------------------------------------------------------------------------------------------------
>
>                 Key: PHOENIX-5136
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-5136
>             Project: Phoenix
>          Issue Type: Bug
>    Affects Versions: 5.0.0
>            Reporter: Hieu Nguyen
>            Priority: Major
>
> Rows with null values inserted using UPSERT .. ON DUPLICATE KEY UPDATE will be selected
in queries when they should not be.
> Here is a failing test that demonstrates the issue:
> {noformat}
> @Test
> public void testRowsCreatedViaUpsertOnDuplicateKeyShouldNotBeReturnedInQueryIfNotMatched()
throws Exception {
>     Properties props = PropertiesUtil.deepCopy(TEST_PROPERTIES);
>     Connection conn = DriverManager.getConnection(getUrl(), props);
>     String tableName = generateUniqueName();
>     String ddl = " create table " + tableName + "(pk varchar primary key, counter1 bigint,
counter2 smallint)";
>     conn.createStatement().execute(ddl);
>     createIndex(conn, tableName);
>     // The data has to be specifically starting with null for the first counter to fail
the test. If you reverse the values, the test passes.
>     String dml1 = "UPSERT INTO " + tableName + " VALUES('a',NULL,2) ON DUPLICATE KEY
UPDATE " +
>             "counter1 = CASE WHEN (counter1 IS NULL) THEN NULL ELSE counter1 END, " +
>             "counter2 = CASE WHEN (counter1 IS NULL) THEN 2 ELSE counter2 END";
>     conn.createStatement().execute(dml1);
>     conn.commit();
>     String dml2 = "UPSERT INTO " + tableName + " VALUES('b',1,2) ON DUPLICATE KEY UPDATE
" +
>             "counter1 = CASE WHEN (counter1 IS NULL) THEN 1 ELSE counter1 END, " +
>             "counter2 = CASE WHEN (counter1 IS NULL) THEN 2 ELSE counter2 END";
>     conn.createStatement().execute(dml2);
>     conn.commit();
>     // Using this statement causes the test to pass
>     //ResultSet rs = conn.createStatement().executeQuery("SELECT * FROM " + tableName
+ " WHERE counter2 = 2 AND counter1 = 1");
>     // This statement should be equivalent to the one above, but it selects both rows.
>     ResultSet rs = conn.createStatement().executeQuery("SELECT * FROM " + tableName +
" WHERE counter2 = 2 AND (counter1 = 1 OR counter1 = 1)");
>     assertTrue(rs.next());
>     assertEquals("b",rs.getString(1));
>     assertEquals(1,rs.getLong(2));
>     assertEquals(2,rs.getLong(3));
>     assertFalse(rs.next());
>     conn.close();
> }{noformat}
> The conditions are fairly specific:
>  * Must use ON DUPLICATE KEY UPDATE.  Inserting rows using UPSERT by itself will have
correct results
>  * The "counter2 = 2 AND (counter1 = 1 OR counter1 = 1)" condition caused the test
to fail, as opposed to the equivalent but simpler "counter2 = 2 AND counter1 = 1".  I tested
a similar "counter2 = 2 AND (counter1 = 1 OR counter1 < 1)", which also caused the test
to fail.
>  * If the NULL value for row 'a' is instead in the last position (counter2), then
row 'a' is not selected in the query as expected.  The below test demonstrates this behavior
(it passes as expected):
> {noformat}
> @Test
> public void testRowsCreatedViaUpsertOnDuplicateKeyShouldNotBeReturnedInQueryIfNotMatched()
throws Exception {
>     Properties props = PropertiesUtil.deepCopy(TEST_PROPERTIES);
>     Connection conn = DriverManager.getConnection(getUrl(), props);
>     String tableName = generateUniqueName();
>     String ddl = " create table " + tableName + "(pk varchar primary key, counter1 bigint,
counter2 smallint)";
>     conn.createStatement().execute(ddl);
>     createIndex(conn, tableName);
>     String dml1 = "UPSERT INTO " + tableName + " VALUES('a',1,NULL) ON DUPLICATE KEY
UPDATE " +
>             "counter1 = CASE WHEN (counter1 IS NULL) THEN 1 ELSE counter1 END, " +
>             "counter2 = CASE WHEN (counter1 IS NULL) THEN NULL ELSE counter2 END";
>     conn.createStatement().execute(dml1);
>     conn.commit();
>     String dml2 = "UPSERT INTO " + tableName + " VALUES('b',1,2) ON DUPLICATE KEY UPDATE
" +
>             "counter1 = CASE WHEN (counter1 IS NULL) THEN 1 ELSE counter1 END, " +
>             "counter2 = CASE WHEN (counter1 IS NULL) THEN 2 ELSE counter2 END";
>     conn.createStatement().execute(dml2);
>     conn.commit();
>     ResultSet rs = conn.createStatement().executeQuery("SELECT * FROM " + tableName +
" WHERE counter1 = 1 AND (counter2 = 2 OR counter2 = 2)");
>     assertTrue(rs.next());
>     assertEquals("b",rs.getString(1));
>     assertEquals(1,rs.getLong(2));
>     assertEquals(2,rs.getLong(3));
>     assertFalse(rs.next());
>     conn.close();
> }
> {noformat}
> We also noticed this behavior when upserting and selecting manually against a View.
> Any ideas on where to look to fix this issue?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message