drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Krystal (JIRA)" <j...@apache.org>
Subject [jira] [Issue Comment Deleted] (DRILL-862) Select against hbase tables with empty number values fails
Date Tue, 12 Aug 2014 15:39:11 GMT

     [ https://issues.apache.org/jira/browse/DRILL-862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Krystal updated DRILL-862:
--------------------------

    Comment: was deleted

(was: {code} 
Removing with clause and substituting sub queries failed to plan 

message: "Failure while parsing sql. < CannotPlanException:[ Node [rel#4740:Subset#143.PHYSICAL.SINGLETON([]).[0]]
could not be implemented; planner state:

Root: rel#4740:Subset#143.PHYSICAL.SINGLETON([]).[0]
Original rel:
AbstractConverter(subset=[rel#4740:Subset#143.PHYSICAL.SINGLETON([]).[0]], convention=[PHYSICAL],
DrillDistributionTraitDef=[SINGLETON([])], sort=[[0]]): rowcount = 5.769738915947539E28, cumulative
cost = {inf}, id = 4742
  DrillScreenRel(subset=[rel#4739:Subset#143.LOGICAL.ANY([]).[0]]): rowcount = 5.769738915947539E28,
cumulative cost = {5.76973891594754E27 rows, 5.76973891594754E27 cpu, 0.0 io, 0.0 network},
id = 4738
    DrillLimitRel(subset=[rel#4737:Subset#142.LOGICAL.ANY([]).[0]], fetch=[100]): rowcount
= 5.769738915947539E28, cumulative cost = {100.0 rows, 400.0 cpu, 0.0 io, 0.0 network}, id
= 4736
      DrillSortRel(subset=[rel#4735:Subset#141.LOGICAL.ANY([]).[0]], sort0=[$0], dir0=[ASC]):
rowcount = 5.769738915947539E28, cumulative cost = {1.528404056579686E31 rows, 5.769738915947539E28
cpu, 0.0 io, 0.0 network}, id = 4734
        DrillProjectRel(subset=[rel#4733:Subset#140.LOGICAL.ANY([]).[]], c_customer_id=[$2]):
rowcount = 5.769738915947539E28, cumulative cost = {5.769738915947539E28 rows, 4.0 cpu, 0.0
io, 0.0 network}, id = 4732
          DrillFilterRel(subset=[rel#4731:Subset#139.LOGICAL.ANY([]).[]], condition=[>($1,
CAST(*($4, 1.2)):DECIMAL(2, 1))]): rowcount = 5.769738915947539E28, cumulative cost = {1.1539477831895079E29
rows, 4.6157911327580315E29 cpu, 0.0 io, 0.0 network}, id = 4730
            DrillJoinRel(subset=[rel#4729:Subset#138.LOGICAL.ANY([]).[]], condition=[=($0,
$3)], joinType=[left]): rowcount = 1.1539477831895079E29, cumulative cost = {1.1539477831895079E29
rows, 0.0 cpu, 0.0 io, 0.0 network}, id = 4728
              DrillProjectRel(subset=[rel#4705:Subset#130.LOGICAL.ANY([]).[]], ctr_store_sk=[$1],
ctr_total_return=[$2], c_customer_id=[$5]): rowcount = 1.9138628531992496E11, cumulative cost
= {1.9138628531992496E11 rows, 12.0 cpu, 0.0 io, 0.0 network}, id = 4704
                DrillJoinRel(subset=[rel#4703:Subset#129.LOGICAL.ANY([]).[]], condition=[=($7,
$1)], joinType=[inner]): rowcount = 1.9138628531992496E11, cumulative cost = {1.9138628531992496E11
rows, 0.0 cpu, 0.0 io, 0.0 network}, id = 4702
                  DrillJoinRel(subset=[rel#4698:Subset#126.LOGICAL.ANY([]).[]], condition=[=($0,
$4)], joinType=[inner]): rowcount = 7.088380937775E11, cumulative cost = {7.088380937775E11
rows, 0.0 cpu, 0.0 io, 0.0 network}, id = 4697
                    DrillAggregateRel(subset=[rel#4695:Subset#124.LOGICAL.ANY([]).[]], group=[{0,
1}], ctr_total_return=[SUM($2)]): rowcount = 4.72558729185E7, cumulative cost = {4.72558729185E7
rows, 0.0 cpu, 0.0 io, 0.0 network}, id = 4694
                      DrillProjectRel(subset=[rel#4693:Subset#123.LOGICAL.ANY([]).[]], ctr_customer_sk=[$1],
ctr_store_sk=[$2], SR_REVERSED_CHARGE=[$4]): rowcount = 4.72558729185E8, cumulative cost =
{4.72558729185E8 rows, 12.0 cpu, 0.0 io, 0.0 network}, id = 4692
                        DrillJoinRel(subset=[rel#4691:Subset#122.LOGICAL.ANY([]).[]], condition=[=($3,
$7)], joinType=[inner]): rowcount = 4.72558729185E8, cumulative cost = {4.72558729185E8 rows,
0.0 cpu, 0.0 io, 0.0 network}, id = 4690
                          DrillScanRel(subset=[rel#4686:Subset#119.LOGICAL.ANY([]).[]], table=[[dfs,
tpcds, store_returns]]): rowcount = 287514.0, cumulative cost = {287514.0 rows, 1437570.0
cpu, 0.0 io, 0.0 network}, id = 2472
                          DrillFilterRel(subset=[rel#4689:Subset#121.LOGICAL.ANY([]).[]],
condition=[=(CAST($1):INTEGER, 1998)]): rowcount = 10957.35, cumulative cost = {73049.0 rows,
292196.0 cpu, 0.0 io, 0.0 network}, id = 4688
                            DrillScanRel(subset=[rel#4687:Subset#120.LOGICAL.ANY([]).[]],
table=[[dfs, tpcds, date_dim]]): rowcount = 73049.0, cumulative cost = {73049.0 rows, 219147.0
cpu, 0.0 io, 0.0 network}, id = 2568
                    DrillScanRel(subset=[rel#4696:Subset#125.LOGICAL.ANY([]).[]], table=[[dfs,
tpcds, customer]]): rowcount = 100000.0, cumulative cost = {100000.0 rows, 300000.0 cpu, 0.0
io, 0.0 network}, id = 2649
                  DrillFilterRel(subset=[rel#4701:Subset#128.LOGICAL.ANY([]).[]], condition=[=(CAST($2):CHAR(2)
CHARACTER SET "ISO-8859-1" COLLATE "ISO-8859-1$en_US$primary", 'TN')]): rowcount = 1.7999999999999998,
cumulative cost = {12.0 rows, 48.0 cpu, 0.0 io, 0.0 network}, id = 4700
                    DrillScanRel(subset=[rel#4699:Subset#127.LOGICAL.ANY([]).[]], table=[[dfs,
tpcds, store]]): rowcount = 12.0, cumulative cost = {12.0 rows, 36.0 cpu, 0.0 io, 0.0 network},
id = 2783
              DrillAggregateRel(subset=[rel#4727:Subset#137.LOGICAL.ANY([]).[]], group=[{0}],
agg#0=[AVG($1)]): rowcount = 4.019611545520958E18, cumulative cost = {4.019611545520958E18
rows, 0.0 cpu, 0.0 io, 0.0 network}, id = 4726
                DrillProjectRel(subset=[rel#4725:Subset#136.LOGICAL.ANY([]).[]], $f0=[$3],
ctr_total_return=[$2]): rowcount = 4.019611545520958E19, cumulative cost = {4.019611545520958E19
rows, 8.0 cpu, 0.0 io, 0.0 network}, id = 4724
                  DrillJoinRel(subset=[rel#4723:Subset#135.LOGICAL.ANY([]).[]], condition=[=($3,
$1)], joinType=[inner]): rowcount = 4.019611545520958E19, cumulative cost = {4.019611545520958E19
rows, 0.0 cpu, 0.0 io, 0.0 network}, id = 4722
                    DrillAggregateRel(subset=[rel#4695:Subset#124.LOGICAL.ANY([]).[]], group=[{0,
1}], ctr_total_return=[SUM($2)]): rowcount = 4.72558729185E7, cumulative cost = {4.72558729185E7
rows, 0.0 cpu, 0.0 io, 0.0 network}, id = 4694
                      DrillProjectRel(subset=[rel#4693:Subset#123.LOGICAL.ANY([]).[]], ctr_customer_sk=[$1],
ctr_store_sk=[$2], SR_REVERSED_CHARGE=[$4]): rowcount = 4.72558729185E8, cumulative cost =
{4.72558729185E8 rows, 12.0 cpu, 0.0 io, 0.0 network}, id = 4692
                        DrillJoinRel(subset=[rel#4691:Subset#122.LOGICAL.ANY([]).[]], condition=[=($3,
$7)], joinType=[inner]): rowcount = 4.72558729185E8, cumulative cost = {4.72558729185E8 rows,
0.0 cpu, 0.0 io, 0.0 network}, id = 4690
                          DrillScanRel(subset=[rel#4686:Subset#119.LOGICAL.ANY([]).[]], table=[[dfs,
tpcds, store_returns]]): rowcount = 287514.0, cumulative cost = {287514.0 rows, 1437570.0
cpu, 0.0 io, 0.0 network}, id = 2472
                          DrillFilterRel(subset=[rel#4689:Subset#121.LOGICAL.ANY([]).[]],
condition=[=(CAST($1):INTEGER, 1998)]): rowcount = 10957.35, cumulative cost = {73049.0 rows,
292196.0 cpu, 0.0 io, 0.0 network}, id = 4688
                            DrillScanRel(subset=[rel#4687:Subset#120.LOGICAL.ANY([]).[]],
table=[[dfs, tpcds, date_dim]]): rowcount = 73049.0, cumulative cost = {73049.0 rows, 219147.0
cpu, 0.0 io, 0.0 network}, id = 2568
                    DrillAggregateRel(subset=[rel#4721:Subset#134.LOGICAL.ANY([]).[]], group=[{0}]):
rowcount = 5.670704750219999E12, cumulative cost = {5.670704750219999E12 rows, 0.0 cpu, 0.0
io, 0.0 network}, id = 4720
                      DrillProjectRel(subset=[rel#4719:Subset#133.LOGICAL.ANY([]).[]], $f0=[$1]):
rowcount = 5.670704750219999E13, cumulative cost = {5.670704750219999E13 rows, 4.0 cpu, 0.0
io, 0.0 network}, id = 4718
                        DrillJoinRel(subset=[rel#4717:Subset#132.LOGICAL.ANY([]).[]], condition=[true],
joinType=[inner]): rowcount = 5.670704750219999E13, cumulative cost = {5.670704750219999E13
rows, 0.0 cpu, 0.0 io, 0.0 network}, id = 4716
                          DrillJoinRel(subset=[rel#4715:Subset#131.LOGICAL.ANY([]).[]], condition=[true],
joinType=[inner]): rowcount = 5.67070475022E8, cumulative cost = {5.67070475022E8 rows, 0.0
cpu, 0.0 io, 0.0 network}, id = 4714
                            DrillAggregateRel(subset=[rel#4695:Subset#124.LOGICAL.ANY([]).[]],
group=[{0, 1}], ctr_total_return=[SUM($2)]): rowcount = 4.72558729185E7, cumulative cost =
{4.72558729185E7 rows, 0.0 cpu, 0.0 io, 0.0 network}, id = 4694
                              DrillProjectRel(subset=[rel#4693:Subset#123.LOGICAL.ANY([]).[]],
ctr_customer_sk=[$1], ctr_store_sk=[$2], SR_REVERSED_CHARGE=[$4]): rowcount = 4.72558729185E8,
cumulative cost = {4.72558729185E8 rows, 12.0 cpu, 0.0 io, 0.0 network}, id = 4692
                                DrillJoinRel(subset=[rel#4691:Subset#122.LOGICAL.ANY([]).[]],
condition=[=($3, $7)], joinType=[inner]): rowcount = 4.72558729185E8, cumulative cost = {4.72558729185E8
rows, 0.0 cpu, 0.0 io, 0.0 network}, id = 4690
                                  DrillScanRel(subset=[rel#4686:Subset#119.LOGICAL.ANY([]).[]],
table=[[dfs, tpcds, store_returns]]): rowcount = 287514.0, cumulative cost = {287514.0 rows,
1437570.0 cpu, 0.0 io, 0.0 network}, id = 2472
                                  DrillFilterRel(subset=[rel#4689:Subset#121.LOGICAL.ANY([]).[]],
condition=[=(CAST($1):INTEGER, 1998)]): rowcount = 10957.35, cumulative cost = {73049.0 rows,
292196.0 cpu, 0.0 io, 0.0 network}, id = 4688
                                    DrillScanRel(subset=[rel#4687:Subset#120.LOGICAL.ANY([]).[]],
table=[[dfs, tpcds, date_dim]]): rowcount = 73049.0, cumulative cost = {73049.0 rows, 219147.0
cpu, 0.0 io, 0.0 network}, id = 2568
                            DrillScanRel(subset=[rel#4699:Subset#127.LOGICAL.ANY([]).[]],
table=[[dfs, tpcds, store]]): rowcount = 12.0, cumulative cost = {12.0 rows, 36.0 cpu, 0.0
io, 0.0 network}, id = 2783
                          DrillScanRel(subset=[rel#4696:Subset#125.LOGICAL.ANY([]).[]], table=[[dfs,
tpcds, customer]]): rowcount = 100000.0, cumulative cost = {100000.0 rows, 300000.0 cpu, 0.0
io, 0.0 network}, id = 2649
{code})

> Select against hbase tables with empty number values fails 
> -----------------------------------------------------------
>
>                 Key: DRILL-862
>                 URL: https://issues.apache.org/jira/browse/DRILL-862
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Storage - HBase
>            Reporter: Krystal
>            Assignee: Suresh Ollala
>             Fix For: 0.5.0
>
>
> git.commit.id.abbrev=01bf849
> I have a hbase table that contains empty values for some columns/rows.   This was due
to importing data from a file into hbase. For example a get of a row containing empty age
value in hbase:
> hbase(main):003:0> get 'voterspaces', '17'
> COLUMN                       CELL                                                   
                         
>  fourcf:create_date          timestamp=1401380832939, value=2014-06-04 06:49:06     
                         
>  onecf:name                  timestamp=1401380832939, value=alice garcia            
                         
>  threecf:contributions       timestamp=1401380832939, value=468.51                  
                         
>  threecf:voterzone           timestamp=1401380832939, value=18555                   
                         
>  twocf:age                   timestamp=1401380832939, value=                        
                         
>  twocf:registration          timestamp=1401380832939, value=republican       
> The following query from drill fails due to empty values instead of null:
> select cast(row_key as integer) voter_id, convert_from(onecf['name'], 'UTF8') name, cast(twocf['age']
as integer) age, cast(twocf['registration'] as varchar(20)) registration, cast(threecf['contributions']
as decimal(6,2)) contributions, cast(threecf['voterzone'] as integer) voterzone,cast(fourcf['create_date']
as timestamp) create_date from voterspaces where row_key < 20;
> message: "Failure while running fragment. < NumberFormatException:[  ]"
> From hive, running the same query against a hive table that is mapped to the same hbase
table succeeded.   If this how it expected to work in drill, then we should to document it.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message