hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edward Capriolo <edlinuxg...@gmail.com>
Subject Re: Hive LLAP with Parquet format
Date Thu, 04 May 2017 23:28:38 GMT
The parquet orc thing has to be tje biggest detractor. Your forced to chose
between a format good for impala or good for hive.

On May 4, 2017 3:57 PM, "Gopal Vijayaraghavan" <gopalv@apache.org> wrote:

> Hi,
>
>
> > Does Hive LLAP work with Parquet format as well?
>
>
>
> LLAP does work with the Parquet format, but it does not work very fast,
> because the java Parquet reader is slow.
>
> https://issues.apache.org/jira/browse/PARQUET-131
> +
>
> https://issues.apache.org/jira/browse/HIVE-14826
>
> In particular to your question, Parquet's columnar data reads haven't been
> optimized for Azure/S3/GCS.
>
> There was a comparison of ORC vs Parquet for NYC taxi data and it found
> that for simple queries Parquet read ~4x more data over the network - your
> problem might be bandwidth related.
>
> You might want to convert a small amount to ORC and see whether the
> BYTES_READ drops or not.
>
> In my tests with a recent LLAP, Text data was faster on LLAP on S3 & Azure
> than Parquet, because Text has a vectorized reader & cache support.
>
> Cheers,
>
> Gopal
>

Mime
View raw message