drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aman Sinha (JIRA)" <j...@apache.org>
Subject [jira] [Assigned] (DRILL-3062) regression: Mondrian query447.q - lots of rows missing in result set
Date Wed, 13 May 2015 19:13:01 GMT

     [ https://issues.apache.org/jira/browse/DRILL-3062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Aman Sinha reassigned DRILL-3062:
---------------------------------

    Assignee: Aman Sinha  (was: Daniel Barclay (Drill))

> regression: Mondrian query447.q - lots of rows missing in result set
> --------------------------------------------------------------------
>
>                 Key: DRILL-3062
>                 URL: https://issues.apache.org/jira/browse/DRILL-3062
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Data Types
>    Affects Versions: 1.0.0
>            Reporter: Chun Chang
>            Assignee: Aman Sinha
>            Priority: Blocker
>
> {code}
> 0: jdbc:drill:schema=dfs.drillTestDirComplexJ> select * from sys.version;
> +------------+----------------+-------------+-------------+------------+
> | commit_id  | commit_message | commit_time | build_email | build_time |
> +------------+----------------+-------------+-------------+------------+
> | d1526f9462f6817a76631464ff332bb99b3bdf28 | DRILL-2750: Running 1 or more queries against
Drillbits having insufficient DirectMem renders the Drillbits in an unusable state | 13.05.2015
@ 08:47:20 EDT | Unknown     | 13.05.2015 @ 10:44:43 EDT |
> +------------+----------------+-------------+-------------+------------+
> {code}
> Many (total of 42) mondrian queries regressed. All of them missing rows in the returned
result set.
> Here is an example, query447.q
> {code}
> SELECT time_by_day.the_year            AS c0, 
>        product_class.product_family    AS c1, 
>        customer.state_province         AS c2, 
>        customer.city                   AS c3, 
>        Sum(sales_fact_1997.unit_sales) AS m0 
> FROM   time_by_day AS time_by_day, 
>        sales_fact_1997 AS sales_fact_1997, 
>        product_class AS product_class, 
>        product AS product, 
>        customer AS customer 
> WHERE  sales_fact_1997.time_id = time_by_day.time_id 
>        AND time_by_day.the_year = 1997 
>        AND sales_fact_1997.product_id = product.product_id 
>        AND product.product_class_id = product_class.product_class_id 
>        AND product_class.product_family = 'Drink' 
>        AND sales_fact_1997.customer_id = customer.customer_id 
>        AND customer.state_province = 'WA' 
>        AND customer.city IN ( 'Anacortes', 'Ballard', 'Bellingham', 'Bremerton', 
>                               'Burien', 'Edmonds', 'Everett', 'Issaquah', 
>                               'Kirkland', 'Lynnwood', 'Marysville', 'Olympia', 
>                               'Port Orchard', 'Puyallup', 'Redmond', 'Renton', 
>                               'Seattle', 'Sedro Woolley', 'Spokane', 'Tacoma', 
>                               'Walla Walla', 'Yakima' ) 
> GROUP  BY time_by_day.the_year, 
>           product_class.product_family, 
>           customer.state_province, 
>           customer.city; 
> {code}
> This query should return the following result:
> {code}
> [root@qa-node120 mondrian]# cat query447.e
> 1997	Drink	WA	Walla Walla	191.0000
> 1997	Drink	WA	Issaquah	203.0000
> 1997	Drink	WA	Everett	208.0000
> 1997	Drink	WA	Olympia	1066.0000
> 1997	Drink	WA	Edmonds	166.0000
> 1997	Drink	WA	Bremerton	1160.0000
> 1997	Drink	WA	Renton	225.0000
> 1997	Drink	WA	Bellingham	68.0000
> 1997	Drink	WA	Ballard	214.0000
> 1997	Drink	WA	Burien	251.0000
> 1997	Drink	WA	Seattle	168.0000
> 1997	Drink	WA	Redmond	137.0000
> 1997	Drink	WA	Lynnwood	201.0000
> 1997	Drink	WA	Puyallup	1040.0000
> 1997	Drink	WA	Tacoma	986.0000
> 1997	Drink	WA	Kirkland	247.0000
> 1997	Drink	WA	Sedro Woolley	58.0000
> 1997	Drink	WA	Yakima	1159.0000
> 1997	Drink	WA	Port Orchard	1128.0000
> 1997	Drink	WA	Spokane	2238.0000
> 1997	Drink	WA	Anacortes	82.0000
> 1997	Drink	WA	Marysville	193.0000
> {code}
> But drill now returns:
> {code}
> 1997	Drink	WA	Sedro Woolley	58.0000
> {code}
> Here is the plan:
> {code}
> 0: jdbc:drill:schema=dfs.drillTestDirComplexJ> explain plan for select time_by_day.the_year
as c0, product_class.product_family as c1, customer.state_province as c2, customer.city as
c3, sum(sales_fact_1997.unit_sales) as m0 from time_by_day as time_by_day, sales_fact_1997
as sales_fact_1997, product_class as product_class, product as product, customer as customer
where sales_fact_1997.time_id = time_by_day.time_id and time_by_day.the_year = 1997 and sales_fact_1997.product_id
= product.product_id and product.product_class_id = product_class.product_class_id and product_class.product_family
= 'Drink' and sales_fact_1997.customer_id = customer.customer_id and customer.state_province
= 'WA' and customer.city in ('Anacortes', 'Ballard', 'Bellingham', 'Bremerton', 'Burien',
'Edmonds', 'Everett', 'Issaquah', 'Kirkland', 'Lynnwood', 'Marysville', 'Olympia', 'Port Orchard',
'Puyallup', 'Redmond', 'Renton', 'Seattle', 'Sedro Woolley', 'Spokane', 'Tacoma', 'Walla Walla',
'Yakima') group by time_by_day.the_year, product_class.product_family, customer.state_province,
customer.city;
> +------------+------------+
> |    text    |    json    |
> +------------+------------+
> | 00-00    Screen
> 00-01      Project(c0=[$0], c1=[$1], c2=[$2], c3=[$3], m0=[$4])
> 00-02        HashAgg(group=[{0, 1, 2, 3}], m0=[SUM($4)])
> 00-03          Project(c0=[$0], c1=[$2], c2=[$3], c3=[$4], unit_sales=[$1])
> 00-04            HashJoin(condition=[=($5, $6)], joinType=[inner])
> 00-06              Project(the_year=[$0], unit_sales=[$5], product_family=[$6], state_province=[$10],
city=[$11], f17=[$11])
> 00-08                Project(the_year=[$4], time_id=[$5], time_id0=[$0], product_id=[$1],
customer_id=[$2], unit_sales=[$3], product_family=[$8], product_class_id=[$9], product_id0=[$6],
product_class_id0=[$7], state_province=[$10], city=[$11], customer_id0=[$12])
> 00-09                  HashJoin(condition=[=($2, $12)], joinType=[inner])
> 00-11                    HashJoin(condition=[=($1, $6)], joinType=[inner])
> 00-14                      HashJoin(condition=[=($0, $5)], joinType=[inner])
> 00-18                        Project(time_id=[$2], product_id=[$0], customer_id=[$1],
unit_sales=[$3])
> 00-23                          Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath
[path=maprfs:/drill/testdata/mondrian/sales_fact_1997]], selectionRoot=/drill/testdata/mondrian/sales_fact_1997,
numFiles=1, columns=[`time_id`, `product_id`, `customer_id`, `unit_sales`]]])
> 00-17                        Project(the_year=[$0], time_id0=[$1])
> 00-22                          SelectionVectorRemover
> 00-26                            Filter(condition=[=($0, 1997)])
> 00-28                              Project(the_year=[$1], time_id=[$0])
> 00-30                                Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath
[path=maprfs:/drill/testdata/mondrian/time_by_day]], selectionRoot=/drill/testdata/mondrian/time_by_day,
numFiles=1, columns=[`the_year`, `time_id`]]])
> 00-13                      Project(product_id0=[$0], product_class_id=[$1], product_family=[$2],
product_class_id0=[$3])
> 00-16                        HashJoin(condition=[=($1, $3)], joinType=[inner])
> 00-21                          Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath
[path=maprfs:/drill/testdata/mondrian/product]], selectionRoot=/drill/testdata/mondrian/product,
numFiles=1, columns=[`product_id`, `product_class_id`]]])
> 00-20                          Project(product_family=[$0], product_class_id0=[$1])
> 00-25                            SelectionVectorRemover
> 00-27                              Filter(condition=[=($0, 'Drink')])
> 00-29                                Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath
[path=maprfs:/drill/testdata/mondrian/product_class]], selectionRoot=/drill/testdata/mondrian/product_class,
numFiles=1, columns=[`product_family`, `product_class_id`]]])
> 00-10                    Project(state_province=[$0], city=[$1], customer_id0=[$2])
> 00-12                      SelectionVectorRemover
> 00-15                        Filter(condition=[=($0, 'WA')])
> 00-19                          Project(state_province=[$1], city=[$2], customer_id=[$0])
> 00-24                            Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath
[path=maprfs:/drill/testdata/mondrian/customer]], selectionRoot=/drill/testdata/mondrian/customer,
numFiles=1, columns=[`state_province`, `city`, `customer_id`]]])
> 00-05              HashAgg(group=[{0}])
> 00-07                Values
>  | {
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message