hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ujjwal <ujjwal.wadha...@gmail.com>
Subject Re: only timestamp column value of previous row gets reset
Date Fri, 29 May 2015 18:20:23 GMT
Hi all,

The issue can be reproduced in a simple java program (code attached for
reference/use) where I do not use the iterator right away after reading,
but store it in a vector for later use. As per my understanding, the
iterator should not change once given to the consumer. However the
timestamp datatype object gets reset under one condition explained
earlier.. I have attached the code for reference.


Create a table
---------------------

create table if not exists sample (dtcol date, tscol timestamp, stcol
string) row format delimited fields terminated by ',' stored as textfile;
truncate table sample;



Input data (input)
------------------------

9779-11-21,2014-04-01 11:30:55,abc
9779-11-21,2014-04-04 11:30:55,def
,null,



Load the data
-------------------

hadoop fs -put input /apps/hive/warehouse/sample



Check
---------

hive> select * from sample;

OK

9779-11-21      2014-04-01 11:30:55     abc
9779-11-21      2014-04-04 11:30:55     def
NULL    NULL
Time taken: 0.029 seconds, Fetched: 3 row(s)
hive>



Execute
------------

export CLASSPATH=`hadoop classpath`:`hcat -classpath`
java -classpath SampleHCatReader.jar:$CLASSPATH
org.my.internal.SampleHCatReader



Output having timestamp reset !
------------------------------------------------

HCat record right after reading is  9779-11-21  2014-04-01 11:30:55.0   abc
HCat record right after reading is  9779-11-21  2014-04-04 11:30:55.0   def
HCat record right after reading is  null        null

HCat record later is 9779-11-21 2014-04-01 11:30:55.0   abc
HCat record later is 9779-11-21 1969-12-31 19:00:00.0   def
HCat record later is null       null



As we see above, the output for time-stamp gets reset.


Regards,
Ujjwal W

On Wed, May 27, 2015 at 4:20 PM, Ujjwal <ujjwal.wadhawan@gmail.com> wrote:

> Hi,
>
>
>
> I want to cross check a scenario with you and make sure its not a problem
> on my end.
>
>
> I am trying do to HCatalog read on an edge node and I am seeing a strange
> behavior with timestamp data type. My hive version is hive 0.13.0.2
>
>
>
> First, this is the way documentation suggests the reading to be. (
> https://cwiki.apache.org/confluence/display/Hive/HCatalog+ReaderWriter)
>
>
>
> for(InputSplit split : readCntxt.getSplits()){
>
> HCatReader reader = DataTransferFactory.getHCatReader(split,
>
> readerCntxt.getConf());
>
>        Iterator<HCatRecord> itr = reader.read();
>
>        while(itr.hasNext()){
>
>               HCatRecord *read* = itr.next();
>
>           }
>
> }
>
>
> I am storing the iterator *read* into a buffer for later use in main().
> Later I access this iterator from the stored buffer and drain it by
> printing out the rows in another thread, and I see the following behavior.
>
>
>
> “The column value of data type *timestamp *of a previous row gets reset
> to 1*969-12-31 19:00:00.0* when the column value in the current row has
> *null*. Columns of other data types in previous row do not get affected
> by presence of *null* in its current column value. Also changing the
> order of columns in source data doesn’t change the behavior”
>
>
>
>
> hive> describe bug;
>
> dtcol                   date
>
> tscol                   timestamp
>
> stcol                   string
>
> Time taken: 0.058 seconds, Fetched: 3 row(s)
>
> hive> select * from bug;
> 9779-11-21      2014-04-01 11:30:55     abc
> 9779-11-21      2014-04-04 11:30:55     def
> NULL    NULL
>
>
> Read in thread - 9779-11-21     2014-04-01 11:30:55.0   abc
> Read in thread - 9779-11-21     *1969-12-31 19:00:00.0*   def
> Read in thread - null   null
>
>
> Can this be an issue in Hive timestamp implementation ?
>
> Regards,
> Ujjwal
>

Mime
View raw message