hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Update of "Hive/LanguageManual/Sampling" by AMammenT
Date Thu, 06 Aug 2009 23:36:47 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The following page has been changed by AMammenT:
http://wiki.apache.org/hadoop/Hive/LanguageManual/Sampling

The comment on the change is:
Clear up confusion around cluster vs. bucket and how they interact.  

------------------------------------------------------------------------------
  
  So in the above example, if table 'source' was created with 'CLUSTERED BY id INTO 32 BUCKETS'

  {{{
-     TABLESAMPLE(BUCKET 3 OUT OF 16) 
+     TABLESAMPLE(BUCKET 3 OUT OF 16 ON id) 
  }}}
- would pick out the 3rd and 19th buckets. 
+ would pick out the 3rd and 19th clusters as each bucket would be composed of (32/16)=2 clusters.

  
  On the other hand the tablesample clause
  {{{
      TABLESAMPLE(BUCKET 3 OUT OF 64 ON id) 
  }}}
- would pick out half of the 3rd bucket. 
+ would pick out half of the 3rd cluster as each bucket would be composed of (32/64)=1/2 of
a cluster.
  

Mime
View raw message