hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Keith Wiley <kwi...@keithwiley.com>
Subject Use distribute to spread across reducers
Date Wed, 02 Oct 2013 18:48:14 GMT
I'm trying to create a subset of a large table for testing.  The following approach works:

create table subset_table as
select * from large_table limit 1000

...but it only uses one reducer.  I would like to speed up the process of creating a subset
but distributing across multiple reducers.  I already tried explicitly setting mapred.reduce.tasks
and hive.exec.reducers.max to values larger than 1, but in this particular case, those values
seem to be over-ridden by Hive's internal query->to->mapreduce conversion; it ignores
those parameters.

So, I tried this:

create table subset_table as
select * from large_table limit 1000
distribute by column_name

...but that doesn't parse.  I get the following error:

OK FAILED: ParseException line 3:0 missing EOF at 'distribute' near '1000'.

I have tried NUMEROUS applications of parentheses, nested queries, etc.  For example, here's
just one (amongst perhaps ten variations on a theme):

create table subset_table as
select * from (
from (
select * from large_table limit 1000
distribute by column_name
)) s

Like I said, I've tried all sorts of combinations of the elements shown above.  So far I have
not even gotten any syntax to parse, much less run.  Only the original query at the top will
even pass the parsing stage of processing.

Any ideas?

Thanks.

________________________________________________________________________________
Keith Wiley     kwiley@keithwiley.com     keithwiley.com    music.keithwiley.com

"I do not feel obliged to believe that the same God who has endowed us with
sense, reason, and intellect has intended us to forgo their use."
                                           --  Galileo Galilei
________________________________________________________________________________


Mime
View raw message