hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gang Luo <lgpub...@yahoo.com.cn>
Subject Re: dataset
Date Wed, 03 Mar 2010 21:06:05 GMT
That is a good idea, but doesn't work in my case. What I want to do is to test how my partitioner
could divide the workload properly. It is supposed to go against skew, but not to generate
skew. I still need a skewed data source. Any ideas?



----- 原始邮件 ----
发件人: Aaron Kimball <aaron@cloudera.com>
收件人: common-user@hadoop.apache.org
发送日期: 2010/3/3 (周三) 3:50:59 下午
主   题: Re: dataset

Look at implementing your own Partitioner implementation to control which
records are sent to which reduce shards.

- Aaron

On Wed, Mar 3, 2010 at 12:15 PM, Gang Luo <lgpublic@yahoo.com.cn> wrote:

> Hi all,
> I want to generate some datasets with data skew to test my mapreduce jobs.
> I am using TPC-DS but it seems I cannot control the data skew level. There
> is a suite from Microsoft that could generate skewed datasets based on
> TPC-D, but only workable in windows. I haven't succeed make it compilable in
> linux yet. Please tell me how can I get some skewed dataset.
> Thanks.
> -Gang


View raw message