hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From 村下瑛 <>
Subject Number of mappers is always 1 for external Parquet tables.
Date Wed, 24 Dec 2014 09:27:50 GMT
Hi, all

I am trying to load pig output from Hive as an external table,
and currently stuck with that Hive always set the number of mappers to 1,
though it has more than 10 million records and is composed of multiple
Could any of guys have any idea?

To be more specific, the output is in Parquet format generated by Pig Script
without any compression.

STORE rows INTO '/table-data/test' USING parquet.pig.ParquetStorer;

The directory does contain 16 part-m-00xx.parquet files and _metadata.
And the external table is pointed to the directory.

Here are the create table statement I've used.

  `id` string,
  `f1` string,

It seem to properly read the parquet file itself since
returns the proper result.

However, everytime I give it queries that requires mapreduce jobs,
It only uses single mapper, and takes like forever.

hive> select count(*) from t_main_wop;
Query ID = xxx
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Job = job_yyy, Tracking URL = zzz
Kill Command = hadoop_job  -kill job_yyy
Hadoop job information for Stage-1: number of mappers: 1; number of
reducers: 1
2014-12-24 02:49:46,912 Stage-1 map = 0%,  reduce = 0%
2014-12-24 02:50:45,847 Stage-1 map = 0%,  reduce = 0%

Why is it?
I've set, but to no avail.
Again the directory contans 16 part files, so I think it sould be able to
use at least 16 mappers.

I would really appreciate if you could give me any suggestions


View raw message