hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ryan Harris <Ryan.Har...@zionsbancorp.com>
Subject RE: External sorted tables
Date Mon, 03 Aug 2015 18:11:42 GMT
Sort-Merge-Bucket (SMB) joins and MAPJOINs are related, but separate...

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+JoinOptimization

I think you should still be able to get map-side joins (if the join table is small enough)....

Nothing here that says sorting is required:
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Joins#LanguageManualJoins-MapJoinRestrictions

You need sorting (and bucketing) for the SMB join, but you aren't specifying bucketing.

I'd suggest either bucketing the data along with sorting it, or try it without 'sorted by'
and see if you can execute a mapjoin.


From: David Capwell [mailto:dcapwell@gmail.com]
Sent: Monday, August 03, 2015 11:59 AM
To: user@hive.apache.org
Subject: RE: External sorted tables


Mostly wanted to tell hive it's sorted so it could use more efficient joins like a map side
join.  No other reason
On Aug 3, 2015 10:47 AM, "Ryan Harris" <Ryan.Harris@zionsbancorp.com<mailto:Ryan.Harris@zionsbancorp.com>>
wrote:
Unless you are using bucketing and sampling, there is no benefit (that I can think of) to
informing hive that the data *is* in fact sorted...

If there is something specific you are trying to accomplish by specifying the sort order of
that column, perhaps you can elaborate on that.  Otherwise, leave out the 'sorted by' statement
and you should be fine.

From: David Capwell [mailto:dcapwell@gmail.com<mailto:dcapwell@gmail.com>]
Sent: Monday, August 03, 2015 10:50 AM
To: user@hive.apache.org<mailto:user@hive.apache.org>
Subject: Re: External sorted tables


Based off the ddl it is required to have buckets, I was wondering if there was a way to get
around it?

Thinking as a hack I could try bucket=1, but if there is a better way would love to know
On Aug 2, 2015 6:18 PM, "Takahiko Saito" <tysaito@gmail.com<mailto:tysaito@gmail.com>>
wrote:
Hi,

Is it possible that 'create table sorted by' must have buckets?

I found the below statements in https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL:

"The CLUSTERED BY and SORTED BY creation commands do not affect how data is inserted into
a table – only how it is read. This means that users must be careful to insert data correctly
by specifying the number of reducers to be equal to the number of buckets, and using CLUSTER
BY and SORT BY commands in their query."

On Thu, Jul 30, 2015 at 7:22 PM, David Capwell <dcapwell@gmail.com<mailto:dcapwell@gmail.com>>
wrote:
We are trying to create a external table in hive. This data is sorted,
so wanted to tell hive about this. When I do, it complains about
parsing the create.

> CREATE EXTERNAL TABLE IF NOT EXISTS store.testing (
...
. . . . . . . . . . . . . . . . . . .>   timestamp bigint,
...)
. . . . . . . . . . . . . . . . . . .>   SORTED BY (timestamp)
...
. . . . . . . . . . . . . . . . . . .>   LOCATION '/project/db/table'
. . . . . . . . . . . . . . . . . . .> ;
Error: Error while compiling statement: FAILED: ParseException line
1:507 missing EOF at 'SORTED' near ')' (state=42000,code=40000)
2: jdbc:hive2://localhost:10000/store>

What can I do to let hive know that my data is sorted? Every example
online of sorted by is grouped with buckets, but we really don't want
to add bucketing.


Hive version: 0.14.0

Thanks for your help!



--
Takahiko Saito
________________________________
THIS ELECTRONIC MESSAGE, INCLUDING ANY ACCOMPANYING DOCUMENTS, IS CONFIDENTIAL and may contain
information that is privileged and exempt from disclosure under applicable law. If you are
neither the intended recipient nor responsible for delivering the message to the intended
recipient, please note that any dissemination, distribution, copying or the taking of any
action in reliance upon the message is strictly prohibited. If you have received this communication
in error, please notify the sender immediately. Thank you.

======================================================================
THIS ELECTRONIC MESSAGE, INCLUDING ANY ACCOMPANYING DOCUMENTS, IS CONFIDENTIAL and may contain
information that is privileged and exempt from disclosure under applicable law. If you are
neither the intended recipient nor responsible for delivering the message to the intended
recipient, please note that any dissemination, distribution, copying or the taking of any
action in reliance upon the message is strictly prohibited. If you have received this communication
in error, please notify the sender immediately.  Thank you.
Mime
View raw message