drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jacques Nadeau (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (DRILL-1345) Drill can write to Amazon S3 storage buckets but not read from them
Date Sun, 04 Jan 2015 21:29:40 GMT

     [ https://issues.apache.org/jira/browse/DRILL-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jacques Nadeau resolved DRILL-1345.
-----------------------------------
    Resolution: Fixed

> Drill can write to Amazon S3 storage buckets but not read from them
> -------------------------------------------------------------------
>
>                 Key: DRILL-1345
>                 URL: https://issues.apache.org/jira/browse/DRILL-1345
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Storage - Parquet, Storage - Text & CSV
>    Affects Versions: 0.4.0, 0.5.0
>         Environment: CentOS 6.3 on Amazon Web Services virtual instance
>            Reporter: David Tucker
>            Assignee: David Tucker
>            Priority: Critical
>             Fix For: Future
>
>
> After configuring the storage plug-in for Amazon S3, drill commands will correctly create
parquet or csv files in the S3 bucket.   However, attempting to read those file results in
a software hang.
> To reproduce the issue :
>    Confirm Hadoop access to the bucket from the shell with
>        'hadoop fs -ls s3://<bucket>/'
>     Likely causes for failure of hadoop access are incorrect user
>         authentication settings in core-site.xml.   You'll need appropriate 
>         AWS authentication keys for the following properties 
>             fs.s3.awsAccessKeyId
>             fs.s3.awsSecretAccessKey
>             fs.s3n.awsAccessKeyId
>             fs.s3n.awsSecretAccessKey
>     Configure S3 storage plug-in (clone of default DFS plug-in 
>       with a single change to the connection string {should be
>       "s3://<bucket>".   This CANNOT BE DONE until the actual 
>       connectivity to the bucket is verified (a separate issue with storage
>       plug-in configuration that MUST connect to the target
>       connection string or it fails).
> Simple queries to create tables in the S3 bucket will work.
>   alter session set `store.format`='parquet' ;
>   create table `my-s3`.`/employee1` as select * from cp.`employee.json` ;
>  
>   alter session set `store.format`='csv' ;
>   create table `my-s3`.`/employee2` as select * from cp.`employee.json` ;
>  
> Confirm the existence of the files in the S3 bucket, and the readability of their contents
with "hadoop fs" commands.
> Attempts to read the same tables will hang
>      select * from `my-s3`.'/employee1'
> "jstack -F <drillbit_pid>" indicates there is a deadlock of some kind.
> NOTE: The jets3t class enabling S3 data access from MapR Hadoop 4.0.1 client was incompatible
with Drill 0.4 and 0.5.   I had to leave the jets3t library excluded (via hadoop-excludes.txt)
and copy in older jets3t support from MapR 3.0.3.    



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message