hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bruce Bian <weidong....@gmail.com>
Subject Re: confused on different behavior of Bucketized tables do not support INSERT INTO
Date Thu, 31 May 2012 14:16:03 GMT
So I did another test on this.
hive> create table test(foo int,bar string) clustered by(foo) sorted by
(foo asc) into 2 buckets;
OK
Time taken: 0.097 seconds
hive> create table test2 (foo int,bar string) clustered by(foo) sorted by
(foo asc) into 2 buckets;
OK
hive> LOAD DATA LOCAL INPATH 'hive/examples/files/kv1.txt' OVERWRITE INTO
TABLE test;
hive> set hive.enforce.bucketing=true;
hive> set hive.enforce.sorting=true;
hive> insert into table test2 select * from test;
Total MapReduce jobs = 1
Launching Job 1 out of 1
…………………………………………………………
hive> insert into table test2 select * from test2;
FAILED: Error in semantic analysis: Bucketized tables do not support INSERT
INTO: Table: test2

Seems like the error"FAILED: Error in semantic analysis: Bucketized tables
do not support INSERT INTO: Table: vt_new_data error is occurred
" is only thrown when insert into a bucketized table from the same table?
And when insert into a bucketized table multi-times, it will create a
original_file_copy_n under the same bucket.

-rw-r--r--   3 wbian supergroup       2856 2012-05-31 22:03
/user/hive/warehouse/test2/000000_0
-rw-r--r--   3 wbian supergroup       2856 2012-05-31 22:04
/user/hive/warehouse/test2/000000_0_copy_1
-rw-r--r--   3 wbian supergroup       2956 2012-05-31 22:03
/user/hive/warehouse/test2/000001_0
-rw-r--r--   3 wbian supergroup       2956 2012-05-31 22:04
/user/hive/warehouse/test2/000001_0_copy_1

And since what I want to do is SMB Map Join, the following triggered the
SMB Map Join successfully
set hive.optimize.bucketmapjoin= true;
set hive.optimize.bucketmapjoin.sortedmerge = true;
set
hive.input.format=org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;
select /*+mapjoin(test)*/ * from pokes join test on pokes.foo=test.foo;

So what's the reason for throwing that error(i mean why not support insert
into a bucketized table from the same table)?And isn't that error message
kind of misleading?



On Thu, May 31, 2012 at 6:43 PM, Bruce Bian <weidong.ban@gmail.com> wrote:

> I'm using hive 0.9.0
>
> On Thursday, May 31, 2012, Bruce Bian wrote:
>
>> Hi,
>> I've got a table vt_new_data which is defined as follows:
>> CREATE TABLE VT_NEW_DATA
>> (
>>      V_ACCOUNT_NUM string
>>     ,V_ACCOUNT_MODIFIER_NUM string
>>     ,V_DEPOSIT_TYPE_CD string
>>     ,V_DEPOSIT_TERM int
>>     ,V_LEDGER_SUBJECT_ID string
>>     ,V_ACCOUNTING_ORG_CD string
>>     ,V_OPEN_DT string
>>     ,V_CLOSE_DT string
>>     ,V_CURRENCY_CD string
>>     ,V_ACCOUNT_BAL float
>>     ,V_INNER_MONTH_DELAY_ACCUM float
>> ) CLUSTERED BY (V_ACCOUNT_NUM,V_ACCOUNT_MODIFIER_NUM) SORTED BY
>> (V_ACCOUNT_NUM,V_ACCOUNT_MODIFIER_NUM ASC) INTO 256 BUCKETS STORED AS
>> RCFile;
>>
>> when I execute the following query
>> explain insert into table vt_new_data select * from vt_new_data limit 1;
>> (this is just a test)
>> an FAILED: Error in semantic analysis: Bucketized tables do not support
>> INSERT INTO: Table: vt_new_data error is occurred
>>
>> but when I execute the query:
>> explain insert into table vt_new_data
>> select /*+ MAPJOIN(T4) */
>>      t1.account_num as v_account_num
>>     ,t1.account_modifier_num as v_account_modifier_num
>>     ,'3006' as v_deposit_type_cd
>>     ,0 as  v_deposit_term
>>     ,'23201000' v_ledger_subject_id
>>     ,coalesce(t2.party_id,'')  as v_accounting_org_cd
>>     ,coalesce(t3.card_begin_dt,'19000101') as v_open_dt
>>     ,coalesce(t3.card_live_dt,'19000101') as v_close_dt
>>     ,coalesce(t4.currency_cd,substr(t1.account_modifier_num,3,3)) as
>> v_currency_cd
>>     ,coalesce(t4.agt_amt,0) as v_account_bal
>>     ,0 as v_inner_month_delay_accum
>> from t03_e_cash_bucket t1
>> left outer join t03_agt_amount_h_bucket t4
>>     on t1.account_num=t4.account_num
>>     and t1.account_modifier_num=t4.account_modifier_num
>>     and t4.agt_amt_type_cd = '001'
>>     and t4.start_date<='$TXNDATE'
>>     and t4.end_date>'$TXNDATE'
>> left outer join t01_party_card_rela_h_bucket t2
>>     on  t1.card_no = t2.card_no
>>     and t2.party_card_rela_type_cd = '01'
>>     and t2.start_date<='$TXNDATE'
>>     and t2.end_date>'$TXNDATE'
>> left outer join t03_card_bucket t3
>>     on t1.card_no = t3.card_no;
>>
>> the execution plan is generated successfully and triggered an SMB Map
>> Join, which is great.
>>
>> But I don't see the difference here? As both are inserting into a
>> bucketized and sorted table?
>>
>

Mime
View raw message