hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Update of "Hive/LanguageManual/Sampling" by ThomasLento
Date Tue, 23 Jun 2009 18:56:20 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The following page has been changed by ThomasLento:
http://wiki.apache.org/hadoop/Hive/LanguageManual/Sampling

------------------------------------------------------------------------------
  table_sample: TABLESAMPLE (BUCKET x OUT OF y [ON colname])
  }}}
  
- The TABLESAMPLE clause allows the users to write queries for samples of the data instead
of the whole table. The TABLESAMPLE clause can be added to any table in the FROM clause. The
buckets are numbered starting from 0. '''colname''' indicates the column on which to sample
each row in the table. colname can be one of the non-partition columns in the table or '''rand()'''
indicating sampling on the entire row instead of an individual column. The rows of the table
are 'bucketed' on the colname randomly into y buckets numbered 0 through y. Rows which belong
to bucket x are returned.  
+ The TABLESAMPLE clause allows the users to write queries for samples of the data instead
of the whole table. The TABLESAMPLE clause can be added to any table in the FROM clause. The
buckets are numbered starting from 1. '''colname''' indicates the column on which to sample
each row in the table. colname can be one of the non-partition columns in the table or '''rand()'''
indicating sampling on the entire row instead of an individual column. The rows of the table
are 'bucketed' on the colname randomly into y buckets numbered 1 through y. Rows which belong
to bucket x are returned.  
  
  In the following example the 3rd bucket out of the 32 buckets of the table source. 's' is
the table alias.
  {{{

Mime
View raw message