hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sergio Pena <sergio.p...@cloudera.com>
Subject Re: issue while reading parquet file in hive
Date Wed, 05 Aug 2015 17:30:04 GMT
Hi Santlal,

Hive uses parquet int96 type to write and read timestamps. Probably the
error is because of that. You can try with int96 instead of binary.

- Sergio

On Tue, Jul 21, 2015 at 1:54 AM, Santlal J Gupta <
Santlal.Gupta@bitwiseglobal.com> wrote:

> Hello,
>
>
>
> I have following issue.
>
>
>
> I have created parquet file through cascading parquet  and want to  load
> into the hive table.
>
> My datafile contain data of type timestamp.
>
> Cascading parquet does not  support  timestamp data type , so while
> creating parquet file I have given as binary type. After generating parquet
> file , this  Parquet file is loaded successfully in the hive .
>
>
>
> While creating hive table I have given the column type as timestamp.
>
>
>
> Code :
>
>
>
> package com.parquet.TimestampTest;
>
>
>
> import cascading.flow.FlowDef;
>
> import cascading.flow.hadoop.HadoopFlowConnector;
>
> import cascading.pipe.Pipe;
>
> import cascading.scheme.Scheme;
>
> import cascading.scheme.hadoop.TextDelimited;
>
> import cascading.tap.SinkMode;
>
> import cascading.tap.Tap;
>
> import cascading.tap.hadoop.Hfs;
>
> import cascading.tuple.Fields;
>
> import parquet.cascading.ParquetTupleScheme;
>
>
>
> public class GenrateTimeStampParquetFile {
>
>                 static String inputPath =
> "target/input/timestampInputFile1";
>
>                 static String outputPath =
> "target/parquetOutput/TimestampOutput";
>
>
>
>                 public static void main(String[] args) {
>
>
>
>                                 write();
>
>                 }
>
>
>
>                 private static void write() {
>
>                                 // TODO Auto-generated method stub
>
>
>
>                                 Fields field = new
> Fields("timestampField").applyTypes(String.class);
>
>                                 Scheme sourceSch = new
> TextDelimited(field, false, "\n");
>
>
>
>                                 Fields outputField = new
> Fields("timestampField");
>
>
>
>                                 Scheme sinkSch = new
> ParquetTupleScheme(field, outputField,
>
>                                                                 "message
> TimeStampTest{optional binary timestampField ;}");
>
>
>
>                                 Tap source = new Hfs(sourceSch, inputPath);
>
>                                 Tap sink = new Hfs(sinkSch, outputPath,
> SinkMode.REPLACE);
>
>
>
>                                 Pipe pipe = new Pipe("Hive timestamp");
>
>
>
>                                 FlowDef fd =
> FlowDef.flowDef().addSource(pipe, source).addTailSink(pipe, sink);
>
>
>
>                                 new
> HadoopFlowConnector().connect(fd).complete();
>
>                 }
>
> }
>
>
>
> Input file:
>
>
>
> timestampInputFile1
>
>
>
> timestampField
>
> 1988-05-25 15:15:15.254
>
> 1987-05-06 14:14:25.362
>
>
>
> After running the code following files are generated.
>
> Output :
>
> 1. part-00000-m-00000.parquet
>
> 2. _SUCCESS
>
> 3. _metadata
>
> 4. _common_metadata
>
>
>
> I have created the table in hive to load the  part-00000-m-00000.parquet
> file.
>
>
>
> I have written following query in the hive.
>
> Query :
>
>
>
> hive> create table test3(timestampField timestamp) stored as parquet;
>
> hive> load data local inpath
> '/home/hduser/parquet_testing/part-00000-m-00000.parquet' into table test3;
>
> hive> select  * from test3;
>
>
>
> After running above command I got following as output.
>
>
>
> Output :
>
>
>
> OK
>
> SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
>
> SLF4J: Defaulting to no-operation (NOP) logger implementation
>
> SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further
> details.
>
> Failed with exception
> java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException:
> java.lang.ClassCastException: org.apache.hadoop.io.BytesWritable cannot be
> cast to org.apache.hadoop.hive.serde2.io.TimestampWritable
>
>
>
>
>
> But I have got above exception.
>
>
>
> So please help me to solve this problem.
>
>
>
> Currently I am using
>
>     Hive 1.1.0-cdh5.4.2.
>
>    Cascading 2.5.1
>
>    parquet-format-2.2.0
>
>
>
> Thanks
>
> Santlal J. Gupta
>
>
>
>
>
> **************************************Disclaimer******************************************
> This e-mail message and any attachments may contain confidential
> information and is for the sole use of the intended recipient(s) only. Any
> views or opinions presented or implied are solely those of the author and
> do not necessarily represent the views of BitWise. If you are not the
> intended recipient(s), you are hereby notified that disclosure, printing,
> copying, forwarding, distribution, or the taking of any action whatsoever
> in reliance on the contents of this electronic information is strictly
> prohibited. If you have received this e-mail message in error, please
> immediately notify the sender and delete the electronic message and any
> attachments.BitWise does not accept liability for any virus introduced by
> this e-mail or any attachments.
> ********************************************************************************************
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message