sqoop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ram (Jira)" <j...@apache.org>
Subject [jira] [Commented] (SQOOP-3151) Sqoop export HDFS file type auto detection can pick wrong type
Date Fri, 28 Aug 2020 13:56:00 GMT

    [ https://issues.apache.org/jira/browse/SQOOP-3151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17186552#comment-17186552
] 

Ram commented on SQOOP-3151:
----------------------------

[~sanysandish@gmail.com] [~BoglarkaEgyed]

We are using *sqoop 1.4.7* to upload parquet data that is stored in HDFS - *Plain parquet
files and NOT a Hive table*

**We're still facing the same issue - 

 
{code:java}
20/08/28 13:37:02 ERROR sqoop.Sqoop: Got exception running Sqoop: org.kitesdk.data.DatasetIOException:
Cannot access descriptor location: hdfs:///<location>/part-00000-f9f92493-36a1-4714-bcc6-291c118cf599-c000/snappy/parquet/.metadata
org.kitesdk.data.DatasetIOException: Cannot access descriptor location:  hdfs:///<location>/part-00000-f9f92493-36a1-4714-bcc6-291c118cf599-c000/snappy/parquet/.metadata{code}
The command we're running - 

 
{code:java}
/sqoop-1.4.7.bin__hadoop-2.6.0/bin/sqoop export --connect jdbc:postgresql://<postgres_db_details>
--username <username> --password <password> --table <table_name> --export-dir
hdfs:///<location>/part-00000-f9f92493-36a1-4714-bcc6-291c118cf599-c000.parquet
{code}
Postgres JAR - postgresql-42.2.11.jar

Please do suggest a solution ASAP.

> Sqoop export HDFS file type auto detection can pick wrong type
> --------------------------------------------------------------
>
>                 Key: SQOOP-3151
>                 URL: https://issues.apache.org/jira/browse/SQOOP-3151
>             Project: Sqoop
>          Issue Type: Bug
>    Affects Versions: 1.4.6
>            Reporter: Boglarka Egyed
>            Assignee: Sandish Kumar HN
>            Priority: Major
>
> It appears that Sqoop export tries to detect the file format by reading the first 3 characters
of a file. Based on that header, the appropriate file reader is used. However, if the result
set happens to contain the header sequence, the wrong reader is chosen resulting in a misleading
error.
> For example, if someone is exporting a table in which one of the field values is "PART".
Since Sqoop sees the letters "PAR", it is invoking the Kite SDK as it assumes the file is
in Parquet format. This leads to a misleading error:
> ERROR sqoop.Sqoop: Got exception running Sqoop: org.kitesdk.data.DatasetNotFoundException:
Descriptor location does not exist: hdfs://<path>.metadata 
> org.kitesdk.data.DatasetNotFoundException: Descriptor location does not exist: hdfs://<path>.metadata
> This can be reproduced easily, using Hive as a real world example:
> > create table test2 (val string);
> > insert into test1 values ('PAR');
> Then run a sqoop export against the table data:
> $ sqoop export --connect $MYCONN --username $MYUSER --password $MYPWD -m 1 --export-dir
/user/hive/warehouse/test --table $MYTABLE
> Sqoop will fail with the following:
> ERROR sqoop.Sqoop: Got exception running Sqoop: org.kitesdk.data.DatasetNotFoundException:
Descriptor location does not exist: hdfs://<path>.metadata
> org.kitesdk.data.DatasetNotFoundException: Descriptor location does not exist: hdfs://<path>.metadata
> Changing value from "PAR" to something else, like 'Obj' (Avro) or 'SEQ' (sequencefile),
which will result in similar errors.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Mime
View raw message