drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "SAIKRISHNA (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-2223) Empty parquet file created with Limit 0 query errors out when querying
Date Tue, 29 Nov 2016 08:10:58 GMT

    [ https://issues.apache.org/jira/browse/DRILL-2223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15704613#comment-15704613
] 

SAIKRISHNA commented on DRILL-2223:
-----------------------------------

Is there any way to create empty parquet schema with zero records , for my business use case
we need to create empty parquet with schema as it is creating in json with zero records.

I am trying with below query getting zero records 
create table target.HIVE.employeeTest2911 AS SELECT * FROM cp.`employee.json` where employee_id
>1157

Fragment      Number of records written
0_0	             0

when I try this 
             select * from  target.HIVE.employeeTest2911
getting this exception 
org.apache.drill.common.exceptions.UserRemoteException: VALIDATION ERROR: From line 1, column
16 to line 1, column 21: Table 'target.HIVE.employeeTest2911' not found SQL Query null [Error
Id: 5ee67a9b-b3ec-4ac8-88bd-13d8428f1d48 on DataNode1:31010]

Workspace structure is like this 

{
  "type": "file",
  "enabled": true,
  "connection": "hdfs://XXXXXXXXXXX:8020",
  "config": null,
  "workspaces": {
    "HIVE": {
      "location": "/user/tmp",
      "writable": true,
      "defaultInputFormat": null
    }
  },
  "formats": {
    "parquet": {
      "type": "parquet"
    }
  }
}
Can I have solution for this,if any one has the solution to overcome this please let me know,
Thanks in advance

> Empty parquet file created with Limit 0 query errors out when querying
> ----------------------------------------------------------------------
>
>                 Key: DRILL-2223
>                 URL: https://issues.apache.org/jira/browse/DRILL-2223
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Storage - Parquet
>    Affects Versions: 0.7.0
>            Reporter: Aman Sinha
>             Fix For: Future
>
>
> Doing a CTAS with limit 0 creates a 0 length parquet file which errors out during querying.
 This should at least write the schema information and metadata which will allow queries to
run. 
> {code}
> 0: jdbc:drill:zk=local> create table tt_nation2 as select n_nationkey, n_name, n_regionkey
from cp.`tpch/nation.parquet` limit 0;
> +------------+---------------------------+
> |  Fragment  | Number of records written |
> +------------+---------------------------+
> | 0_0        | 0                         |
> +------------+---------------------------+
> 1 row selected (0.315 seconds)
> 0: jdbc:drill:zk=local> select n_nationkey from tt_nation2;
> Query failed: RuntimeException: file:/tmp/tt_nation2/0_0_0.parquet is not a Parquet file
(too small)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message