phoenix-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "James Taylor (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PHOENIX-3112) Partial row scan not handled correctly
Date Mon, 02 Oct 2017 21:36:02 GMT

    [ https://issues.apache.org/jira/browse/PHOENIX-3112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16188885#comment-16188885
] 

James Taylor commented on PHOENIX-3112:
---------------------------------------

+1 on the patch, [~sergey.soldatov]. The test I added passes with your patch too. Do you have
a similar one? If not, how about including my test in your patch? IMHO, that's the most important
test as without making a change here, the client silently gets back incomplete results.

Somewhat orthogonal to this, when used, our RoundRobinResultIterator might either get slower
or not function correctly if the scanner returns less rows than is specified by the JDBC fetch
size. Any insight, [~samarthjain]?

[~lhofhansl] - a client may specify {{hbase.client.scanner.caching}} in their own hbase-site.xml
(looks like HBase defaults to 100 while Phoenix defaults Statement fetch size to 1000).

> Partial row scan not handled correctly
> --------------------------------------
>
>                 Key: PHOENIX-3112
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-3112
>             Project: Phoenix
>          Issue Type: Bug
>    Affects Versions: 4.7.0
>            Reporter: Pierre Lacave
>            Assignee: Sergey Soldatov
>            Priority: Blocker
>             Fix For: 4.12.0
>
>         Attachments: PHOENIX-3112-1.patch, PHOENIX-3112-ssa-v4.patch, PHOENIX-3112_v3.patch,
PHOENIX-3112_wip2.patch
>
>
> When doing a select of a relatively large table (a few touthands rows) some rows return
partially missing.
> When increasing the fitler to return those specific rows, the values appear as expected
> {noformat}
> CREATE TABLE IF NOT EXISTS TEST (
>         BUCKET VARCHAR,
>         TIMESTAMP_DATE TIMESTAMP,
>         TIMESTAMP UNSIGNED_LONG NOT NULL,
>         SRC VARCHAR,
>         DST VARCHAR,
>         ID VARCHAR,
>         ION VARCHAR,
>         IC BOOLEAN NOT NULL,
>         MI UNSIGNED_LONG,
>         AV UNSIGNED_LONG,
>         MA UNSIGNED_LONG,
>         CNT UNSIGNED_LONG,
>         DUMMY VARCHAR
>     CONSTRAINT pk PRIMARY KEY (BUCKET, TIMESTAMP DESC, SRC, DST, ID, ION, IC)
> );{noformat}
> using a python script to generate a CSV with 5000 rows
> {noformat}
> for i in xrange(5000):
>     print "5SEC,2016-07-21 07:25:35.{i},146908593500{i},WWWWWWWW,AAA,BBBB,CCCCCCCC,false,{i}1181000,1788000{i},2497001{i},{i},aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa{i}".format(i=i)
> {noformat}
> bulk inserting the csv in the table
> {noformat}
> phoenix/bin/psql.py localhost -t TEST large.csv
> {noformat}
> here we can see one row that contains no TIMESTAMP_DATE and null values in MI and MA
> {noformat}
> 0: jdbc:phoenix:localhost:2181> select * from TEST 
> ....
> +---------+--------------------------+-------------------+-----------+------+-------+-----------+--------+--------------+--------------+--------------+-------+----------------------------------------------------------------------------+
> | BUCKET  |      TIMESTAMP_DATE      |     TIMESTAMP     |    SRC    | DST  |  ID   |
   ION    |   IC   |      MI      |      AV      |      MA      |  CNT  |                
                  DUMMY                                    |
> +---------+--------------------------+-------------------+-----------+------+-------+-----------+--------+--------------+--------------+--------------+-------+----------------------------------------------------------------------------+
> | 5SEC    | 2016-07-21 07:25:35.100  | 1469085935001000  | WWWWWWWW  | AAA  | BBBB  |
CCCCCCCC  | false  | 10001181000  | 17880001000  | 24970011000  | 1000  | aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa1000
 |
> | 5SEC    | 2016-07-21 07:25:35.999  | 146908593500999   | WWWWWWWW  | AAA  | BBBB  |
CCCCCCCC  | false  | 9991181000   | 1788000999   | 2497001999   | 999   | aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa999
  |
> | 5SEC    | 2016-07-21 07:25:35.998  | 146908593500998   | WWWWWWWW  | AAA  | BBBB  |
CCCCCCCC  | false  | 9981181000   | 1788000998   | 2497001998   | 998   | aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa998
  |
> | 5SEC    |                          | 146908593500997   | WWWWWWWW  | AAA  | BBBB  |
CCCCCCCC  | false  | null         | 1788000997   | null         | 997   |                
                                                           |
> | 5SEC    | 2016-07-21 07:25:35.996  | 146908593500996   | WWWWWWWW  | AAA  | BBBB  |
CCCCCCCC  | false  | 9961181000   | 1788000996   | 2497001996   | 996   | aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa996
  |
> | 5SEC    | 2016-07-21 07:25:35.995  | 146908593500995   | WWWWWWWW  | AAA  | BBBB  |
CCCCCCCC  | false  | 9951181000   | 1788000995   | 2497001995   | 995   | aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa995
  |
> | 5SEC    | 2016-07-21 07:25:35.994  | 146908593500994   | WWWWWWWW  | AAA  | BBBB  |
CCCCCCCC  | false  | 9941181000   | 1788000994   | 2497001994   | 994   | aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa994
  |
> ....
> {noformat}
> but when selecting that row specifically the values are correct
> {noformat}
> 0: jdbc:phoenix:localhost:2181> select * from TEST where timestamp = 146908593500997;
> +---------+--------------------------+------------------+-----------+------+-------+-----------+--------+-------------+-------------+-------------+------+---------------------------------------------------------------------------+
> | BUCKET  |      TIMESTAMP_DATE      |    TIMESTAMP     |    SRC    | DST  |  ID   |
   ION    |   IC   |     MI      |     AV      |     MA      | CNT  |                    
              DUMMY                                   |
> +---------+--------------------------+------------------+-----------+------+-------+-----------+--------+-------------+-------------+-------------+------+---------------------------------------------------------------------------+
> | 5SEC    | 2016-07-21 07:25:35.997  | 146908593500997  | WWWWWWWW  | AAA  | BBBB  |
CCCCCCCC  | false  | 9971181000  | 1788000997  | 2497001997  | 997  | aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa997
 |
> +---------+--------------------------+------------------+-----------+------+-------+-----------+--------+-------------+-------------+-------------+------+---------------------------------------------------------------------------+
> 1 row selected (0.159 seconds){noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message