hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cristian Giha <>
Subject Hive Sort By clause on Table creation
Date Fri, 01 Aug 2014 20:42:24 GMT

I am trying to test some optimizations that Partitioning and Clustering tables can do, but
I have a dude on how works the SORT BY clause in a table.
The case is the following:
I create a simple bucketed table as :

CLUSTERED BY (ID) SORTED BY(ID) into 4 buckets;

I am setting some configuration parameters :

*         set hive.enforce.bucketing=true;

*         set hive.enforce.sorting=true;

Suppose that I have a 1 million sample data for this table, and the data is stored automatically
ordered by ID column.
Now I am trying to check for optimized queries with the sorted data. I put in the data a non-ordered
id, duplicated sometimes into the data. I hope that how I tell hive that the data is ordered
by id, it search by id and when finds the first and there aren't more consecutives match of
the same id, it stop the search and return only the first. In the practice it's not happening
and the query return all the row with same id.

Is my logic bad of how to SORT BY() clause helps in the query or something is happening?

Sorry my bad English.
I hope your help...

View raw message