Return-Path: X-Original-To: apmail-drill-issues-archive@minotaur.apache.org Delivered-To: apmail-drill-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 7901B11A7E for ; Wed, 10 Sep 2014 23:43:55 +0000 (UTC) Received: (qmail 21289 invoked by uid 500); 10 Sep 2014 23:43:55 -0000 Delivered-To: apmail-drill-issues-archive@drill.apache.org Received: (qmail 21262 invoked by uid 500); 10 Sep 2014 23:43:55 -0000 Mailing-List: contact issues-help@drill.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@drill.incubator.apache.org Delivered-To: mailing list issues@drill.incubator.apache.org Received: (qmail 21244 invoked by uid 99); 10 Sep 2014 23:43:55 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 10 Sep 2014 23:43:55 +0000 X-ASF-Spam-Status: No, hits=-2001.7 required=5.0 tests=ALL_TRUSTED,RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.3] (HELO mail.apache.org) (140.211.11.3) by apache.org (qpsmtpd/0.29) with SMTP; Wed, 10 Sep 2014 23:43:53 +0000 Received: (qmail 21184 invoked by uid 99); 10 Sep 2014 23:43:33 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 10 Sep 2014 23:43:33 +0000 Date: Wed, 10 Sep 2014 23:43:33 +0000 (UTC) From: "Chun Chang (JIRA)" To: issues@drill.incubator.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (DRILL-1401) join on csv files does not work MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/DRILL-1401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chun Chang updated DRILL-1401: ------------------------------ Attachment: aggregate_100r.csv The csv file used in the query. The other file (join_100r.csv) is a duplicate of the same file. > join on csv files does not work > ------------------------------- > > Key: DRILL-1401 > URL: https://issues.apache.org/jira/browse/DRILL-1401 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization > Affects Versions: 0.5.0 > Reporter: Chun Chang > Attachments: aggregate_100r.csv > > > #Wed Sep 10 13:35:05 PDT 2014 > git.commit.id.abbrev=686eb9e > join (inner,full outer,left,right) does not work if the join is directly applied to csv files. It will produce either null values on the projected columns or NumberFormatException depends on joining condition applied on which columns (first or second columns). If you create views on the csv files, then join works on the views. > For example, the following join give null values: > 0: jdbc:drill:schema=dfs> select cast(`aggregate_100r.csv`.columns[0] as int), cast(`aggregate_100r.csv`.columns[1] as int), cast(`join_100r.csv`.columns[1] as int) from `aggregate_100r.csv` inner join `join_100r.csv` on cast(`aggregate_100r.csv`.columns[0] as int) = cast(`join_100r.csv`.columns[0] as int); > +------------+------------+------------+ > | EXPR$0 | EXPR$1 | EXPR$2 | > +------------+------------+------------+ > | 0 | null | null | > | 0 | null | null | > | 0 | null | null | > | 0 | null | null | > | 0 | null | null | > | 0 | null | null | > | 0 | null | null | > | 0 | null | null | > | 0 | null | null | > | 0 | null | null | > | 0 | null | null | > | 0 | null | null | > | 0 | null | null | > | 0 | null | null | > | 0 | null | null | > | 0 | null | null | > | 1 | null | null | > | 1 | null | null | > | 1 | null | null | > | 1 | null | null | > The following give NumberFormatException: > 0: jdbc:drill:schema=dfs> select cast(`aggregate_100r.csv`.columns[0] as int), cast(`aggregate_100r.csv`.columns[1] as int), cast(`join_100r.csv`.columns[1] as int) from `aggregate_100r.csv` inner join `join_100r.csv` on cast(`aggregate_100r.csv`.columns[1] as int) = cast(`join_100r.csv`.columns[1] as int); > Query failed: Failure while running fragment. [3f67299e-f312-445b-8d6b-74984a820f0c] > Error: exception while executing query: Failure while trying to get next result batch. (state=,code=0) > The following with views works: > 0: jdbc:drill:schema=dfs> select `aggregate_100r_v`.c0, `aggregate_100r_v`.c1, `join_100r_v`.c1 from `aggregate_100r_v` inner join `join_100r_v` on `aggregate_100r_v`.c0 = `join_100r_v`.c0; > +------------+------------+------------+ > | c0 | c1 | c10 | > +------------+------------+------------+ > | 0 | 0 | 0 | > | 0 | 0 | 2 | > | 0 | 0 | 1 | > | 0 | 0 | 0 | > | 0 | 0 | 0 | > | 0 | 0 | 2 | > | 0 | 0 | 1 | > | 0 | 0 | 0 | > | 0 | 1 | 0 | > | 0 | 1 | 2 | > | 0 | 1 | 1 | > | 0 | 1 | 0 | > | 0 | 2 | 0 | > | 0 | 2 | 2 | > | 0 | 2 | 1 | > | 0 | 2 | 0 | > | 1 | 1 | 1 | > | 1 | 1 | 2 | > | 1 | 1 | 1 | > | 1 | 1 | 0 | -- This message was sent by Atlassian JIRA (v6.3.4#6332)