hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Update of "Hive/LanguageManual/SortBy" by AlexSmith
Date Sat, 07 Feb 2009 19:25:47 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The following page has been changed by AlexSmith:
http://wiki.apache.org/hadoop/Hive/LanguageManual/SortBy

The comment on the change is:
adds example for numeric sorting

------------------------------------------------------------------------------
  }}}
  
  
- === How to do Order By? ===
+ === Simulating Order By ===
  
  We can set the number of reducers to 1, to make sure we have the same result as ''ORDER
BY''.
  
@@ -50, +50 @@

  }}}
  
  This sometimes will make the reducer a performance bottleneck.  A lot of cases the user
only wants to see the top N rows where N is a small number.  In this case, we can use LIMIT
clause.  We don't have an example here but users are encouraged to provide one.
+ 
+ === Setting Types for Sort By ===
+ 
+ After a transform, variable types are generally considered to be strings, meaning that numeric
data will be sorted lexicographically.  To overcome this, a second SELECT statement with casts
can be used before using SORT BY.
+ 
+ {{{
+ FROM (FROM (FROM src
+             SELECT TRANSFORM(value)
+             USING 'mapper'
+             AS value, count) mapped
+       SELECT cast(value as double) AS value, cast(count as int) AS count
+       SORT BY value, count) sorted
+ SELECT TRANSFORM(value, count)
+ USING 'reducer'
+ AS whatever
+ }}}
  
  == Syntax of Cluster By and Distribute By ==
  

Mime
View raw message