hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ajo Fod <ajo....@gmail.com>
Subject On bucketing : fewer files than buckets.
Date Mon, 17 Jan 2011 19:03:25 GMT
Hello,

In the documentation I read that as many files are created in each
partition as there are buckets. In the following sample script, I
created 32 buckets, but only find 2 files in each partition directory.
 Am I missing something?

In this sample script, I'm trying to load a tab separated file from
disk into the table trades ... and then transferring data into
alltrades based on the example in :
http://wiki.apache.org/hadoop/Hive/LanguageManual/DDL/BucketedTables

BTW, ANOTHER  question : How does one put in comments in a hive.q file?

-------- sample script ------------
SET hive.enforce.bucketing=TRUE;

CREATE TABLE trades
       (symbol STRING, time STRING, exchange STRING, price FLOAT, volume INT)
PARTITIONED BY (dt STRING)
CLUSTERED BY (symbol)
SORTED BY (time ASC)
INTO 1 BUCKETS
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
 STORED AS TEXTFILE ;

LOAD DATA LOCAL INPATH 'data/2001-05-22'
     INTO TABLE trades
     PARTITION (dt='2001-05-22');

CREATE TABLE alltrades
       (symbol STRING, time STRING, exchange STRING, price FLOAT, volume INT)
PARTITIONED BY (dt STRING)
CLUSTERED BY (symbol)
SORTED BY (time ASC)
INTO 32 BUCKETS
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
 STORED AS TEXTFILE;

FROM trades
INSERT OVERWRITE TABLE alltrades
PARTITION (dt='2001-05-22')
SELECT symbol, time, exchange, price, volume
WHERE dt='2001-05-22';

Mime
View raw message