hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mich Talebzadeh <mich.talebza...@gmail.com>
Subject Parquet timestamp storage in Hive and impact when using Impala to read timestampvalues
Date Fri, 02 Dec 2016 15:01:02 GMT
guys,

I have come across a situation where a multi-tenant cluster is being used
to read and write to Parquet file.

This causes some issues as I understand when Hive stores a timestamp into
Parquet format, it converts local time into UTC time, and when it reads
data out, it converts back to local time.

Impala, however, on the other hand does not do any conversion when it reads
the timestamp column from Parquet file so the UTC time is returned instead
of local time.

so there are multiple issues:

Data read by impala is not converted from UTC to local time
A flag can be set to make impala convert at the cluster level only
a group is saying they don't want to o the conversion at the application
level

So it will cure certain problems but make other tenants less happy with the
conversion.

now my understanding is that this issue comes about because impala bypasses
hive metadata and goes directly to Parquet files.

there is an impact to business.

my suggestion is that if they want performant reads they should use Spark
SQL on Hive. it will always get the same values as stored by Hive






Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.

Mime
View raw message