drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aman Sinha (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-1408) SELECT column from CSV with JOIN returns null if not part of JOIN condition
Date Fri, 12 Sep 2014 14:48:34 GMT

    [ https://issues.apache.org/jira/browse/DRILL-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14131621#comment-14131621
] 

Aman Sinha commented on DRILL-1408:
-----------------------------------

This is a duplicate of DRILL-1401 which should be fixed shortly.  

> SELECT column from CSV with JOIN returns null if not part of JOIN condition
> ---------------------------------------------------------------------------
>
>                 Key: DRILL-1408
>                 URL: https://issues.apache.org/jira/browse/DRILL-1408
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Storage - Text & CSV
>    Affects Versions: 0.5.0
>         Environment: CentOS 6.5, MapR M7
>            Reporter: Mike Davis
>
> A SELECT for a column from a CSV file with a JOIN condition always returns null for columns
not included in the JOIN condition.  When querying each table separately, the values are returned
as expected.
> Note that this works fine for any combination of JSON and Parquet, but fails when at
least one of the files is CSV.
> Simple example with two small CSV files:
> beatles.csv:
> ---------------------------
> 1,John,Lennon
> 2,Paul,McCartney
> 3,George,Harrison
> 4,Ringo,Starr
> ----------------------------
> songs.csv:
> ----------------------------
> 1,Help
> 2,Yesterday
> 3,Blue Jay Way
> 4,Yellow Submarine
> ----------------------------
> This queries returns values as expected:
> SELECT columns[0] AS id, CONCAT(columns[1], ' ', columns[2]) AS singer FROM dfs.`beatles.csv`;
> This query returns 4 results, all with null values:
> SELECT S.columns[1] AS song, CONCAT(B.columns[1], ' ', B.columns[2]) AS singer 
> FROM dfs.`beatles.csv` AS B
> INNER JOIN dfs.`songs.csv` AS S ON B.columns[0] = S.columns[0];
> The only columns that return non-null values are the ones from the JOIN condition (B.columns[0]
and S.columns[0] in the following query):
> SELECT S.columns[1] AS song, CONCAT(B.columns[1], ' ', B.columns[2]) AS singer, S.columns[0]
AS beatles_id, B.columns[0] AS id 
> FROM dfs.`beatles.csv` AS B
> INNER JOIN dfs.`songs.csv` AS S ON B.columns[0] = S.columns[0];



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message