drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mike Davis (JIRA)" <j...@apache.org>
Subject [jira] [Created] (DRILL-1408) SELECT column from CSV with JOIN returns null if not part of JOIN condition
Date Fri, 12 Sep 2014 14:43:33 GMT
Mike Davis created DRILL-1408:
---------------------------------

             Summary: SELECT column from CSV with JOIN returns null if not part of JOIN condition
                 Key: DRILL-1408
                 URL: https://issues.apache.org/jira/browse/DRILL-1408
             Project: Apache Drill
          Issue Type: Bug
          Components: Storage - Text & CSV
    Affects Versions: 0.5.0
         Environment: CentOS 6.5, MapR M7
            Reporter: Mike Davis


A SELECT for a column from a CSV file with a JOIN condition always returns null for columns
not included in the JOIN condition.  When querying each table separately, the values are returned
as expected.

Note that this works fine for any combination of JSON and Parquet, but fails when at least
one of the files is CSV.

Simple example with two small CSV files:

beatles.csv:
---------------------------
1,John,Lennon
2,Paul,McCartney
3,George,Harrison
4,Ringo,Starr
----------------------------

songs.csv:
----------------------------
1,Help
2,Yesterday
3,Blue Jay Way
4,Yellow Submarine
----------------------------

This queries returns values as expected:

SELECT columns[0] AS id, CONCAT(columns[1], ' ', columns[2]) AS singer FROM dfs.`beatles.csv`;


This query returns 4 results, all with null values:

SELECT S.columns[1] AS song, CONCAT(B.columns[1], ' ', B.columns[2]) AS singer 
FROM dfs.`beatles.csv` AS B
INNER JOIN dfs.`songs.csv` AS S ON B.columns[0] = S.columns[0];


The only columns that return non-null values are the ones from the JOIN condition (B.columns[0]
and S.columns[0] in the following query):

SELECT S.columns[1] AS song, CONCAT(B.columns[1], ' ', B.columns[2]) AS singer, S.columns[0]
AS beatles_id, B.columns[0] AS id 
FROM dfs.`beatles.csv` AS B
INNER JOIN dfs.`songs.csv` AS S ON B.columns[0] = S.columns[0];




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message