hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Philip Lee <philjj...@gmail.com>
Subject Re: reading ORC format on Spark-SQL
Date Wed, 10 Feb 2016 21:28:10 GMT
​Thanks for your reply.​

As you see the attach picture <figure 1>, the step of reading input file on
Hive or Spark-SQL has a few steps, right?
I feel like there are some specific steps to read the file.

For example, to load the input file, does Hive or Spark-SQL have to make a
table and then insert the input to it?
The reason why I am asking this kind of question is reading csv file on
Spark is linearly increasing as the data size increase a bit, but reading
ORC format on Spark-SQL is still same as the data size increses in <figure
2>.
(left side was dataset with small dataset, right side was dataset with
large dataset, but as you can see the reading time is approximately same
here)

This cause is from (just property of reading ORC format) or (creating the
table for input and loading the input in the table) or both?


[image: Inline image 1]
<figure 1>
[image: Inline image 1]
<figure 2>

On Wed, Feb 10, 2016 at 10:01 PM, Mich Talebzadeh <mich@peridale.co.uk>
wrote:

> Hi,
>
>
>
> Are you encountering an issue with an ORC file in Spark-sql as opposed to
> reading the same ORC with Hive on Spark engine?
>
>
>
> The only difference would with the Spark Optimizer AKA (Catalyst) using an
> Orc file compared to Hive optimiser doing the same thing.
>
>
>
> Please clarify the underlying issue you are facing.
>
>
>
> HTH
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> NOTE: The information in this email is proprietary and confidential. This
> message is for the designated recipient only, if you are not the intended
> recipient, you should destroy it immediately. Any information in this
> message shall not be understood as given or endorsed by Peridale Technology
> Ltd, its subsidiaries or their employees, unless expressly so stated. It is
> the responsibility of the recipient to ensure that this email is virus
> free, therefore neither Peridale Technology Ltd, its subsidiaries nor their
> employees accept any responsibility.
>
>
>
>
>
> *From:* Philip Lee [mailto:philjjoon@gmail.com]
> *Sent:* 10 February 2016 20:39
> *To:* user@hive.apache.org
> *Subject:* reading ORC format on Spark-SQL
>
>
>
> What kind of steps exists when reading ORC format on Spark-SQL?
>
> I meant usually reading csv file is just directly reading the dataset on
> memory.
>
>
>
> But I feel like Spark-SQL has some steps when reading ORC format.
>
> For example, they have to create table to insert the dataset? and then
> they insert the dataset to the table? theses steps are reading step in
> Spark-SQL?
>
>
>
> [image: Inline image 1]
>
>
>



-- 

==========================================================

*Hae Joon Lee*


Now, in Germany,

M.S. Candidate, Interested in Distributed System, Iterative Processing

Dept. of Computer Science, Informatik in German, TUB

Technical University of Berlin


In Korea,

M.S. Candidate, Computer Architecture Laboratory

Dept. of Computer Science, KAIST


Rm# 4414 CS Dept. KAIST

373-1 Guseong-dong, Yuseong-gu, Daejon, South Korea (305-701)


Mobile) 49) 015-251-448-278 in Germany, no cellular in Korea

==========================================================

Mime
View raw message