hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gopal Vijayaraghavan <>
Subject Re: what is the difference between ³²and ""
Date Tue, 19 Jan 2016 06:16:27 GMT

>Thank-you so much for your quick response. Yea, the option is use only
>for hive-on-tez. I want to know its source, its principle. is the better option as it computes the splits after a job has
been submitted.

Imagine you have 3 tables in your query - with, all the splits
have to be generated before the 1st task is spun up.

with, the 1st task can spin up when at least one of the tables
has already generated splits. GetSplits() is not blocking across all
tables - only within 1 table.

In some cases, you can wait for the 1st task to even finish executing
before starting the split-gen for the 2nd task, producing ~1000x speedups.

For example,

insert into bigtable partition(dt)
select ... from small left outer join bigtable where
date(small.ts) = bigtable.dt and small.txnid = bigtable.txnid
where bigtable.txnid is null

With = true + tez DPP, the split-gen is dynamic and will not
generate splits for 100% of big-table (assuming small table is just today).

>Mybe this resource
>“” is very

It has diagrams, but here's an original .pptx

MD5 (W-235p-Pandey.pptx) = fd3d5c7eb6360f9654bdfbfb20031ba4


View raw message