hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Larson, Kurt" <klar...@wbgames.com>
Subject RE: ORC Transaction Table - Spark
Date Thu, 24 Aug 2017 15:42:19 GMT
Just some clarifying points please.

1.       Is this the general case for all file formats?

2.       Or, is this an artifact of an incompatibility between ORC files written by the Hive
2.x ORC serde not being readable by the Hive 1.x ORC serde?

3.       Is there a difference in the ORC file format spec. at play here?

4.       Or, is any incompatibility limited to the Hive ORC serde implementations in Hive
1.x and 2.x?

5.       What’s the mechanism that affects Spark here?

a.       Same ORC serdes as Hive?

b.      Similar issues in Spark ORC serde implementation(s) as in Hive 1.x ORC serde?

6.       Any similar issues with Parquet format in Hive 1.x and 2.x?


From: Aviral Agarwal [mailto:aviral12028@gmail.com]
Sent: Wednesday, August 23, 2017 10:34 PM
To: user@hive.apache.org
Subject: Re: ORC Transaction Table - Spark

So, there is no way possible right now for Spark to read Hive 2.x data ?

On Thu, Aug 24, 2017 at 12:17 AM, Eugene Koifman <ekoifman@hortonworks.com<mailto:ekoifman@hortonworks.com>>
wrote:
This looks like you have some data written by Hive 2.x and Hive 1.x code trying to read it.
That is not supported.

From: Aviral Agarwal <aviral12028@gmail.com<mailto:aviral12028@gmail.com>>
Reply-To: "user@hive.apache.org<mailto:user@hive.apache.org>" <user@hive.apache.org<mailto:user@hive.apache.org>>
Date: Wednesday, August 23, 2017 at 12:24 AM
To: "user@hive.apache.org<mailto:user@hive.apache.org>" <user@hive.apache.org<mailto:user@hive.apache.org>>
Subject: Re: ORC Transaction Table - Spark

Hi,

Yes it caused by wrong naming convention of the delta directory :

/apps/hive/warehouse/foo.db/bar/year=2017/month=5/delta_0645253_0645253_0001

How do I solve this ?

Thanks !
Aviral Agarwal

On Tue, Aug 22, 2017 at 11:50 PM, Eugene Koifman <ekoifman@hortonworks.com<mailto:ekoifman@hortonworks.com>>
wrote:
Could you do recursive “ls” in your table or partition that you are trying to read?
Most likely you have files that don’t follow expected naming convention

Eugene


From: Aviral Agarwal <aviral12028@gmail.com<mailto:aviral12028@gmail.com>>
Reply-To: "user@hive.apache.org<mailto:user@hive.apache.org>" <user@hive.apache.org<mailto:user@hive.apache.org>>
Date: Tuesday, August 22, 2017 at 5:39 AM
To: "user@hive.apache.org<mailto:user@hive.apache.org>" <user@hive.apache.org<mailto:user@hive.apache.org>>
Subject: ORC Transaction Table - Spark

Hi,

I am trying to read hive orc transaction table through Spark but I am getting the following
error

Caused by: java.lang.RuntimeException: serious problem
at org.apache.hadoop.hive.ql.io<http://org.apache.hadoop.hive.ql.io>.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1021)
at org.apache.hadoop.hive.ql.io<http://org.apache.hadoop.hive.ql.io>.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1048)
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:202)
.....
Caused by: java.util.concurrent.ExecutionException: java.lang.NumberFormatException: For input
string: "0645253_0001"
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:192)
at org.apache.hadoop.hive.ql.io<http://org.apache.hadoop.hive.ql.io>.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:998)
... 118 more

Any help would be appreciated.

Thanks and Regards,
Aviral Agarwal


Mime
View raw message