hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron Kimball <aa...@cloudera.com>
Subject Re: dataset
Date Wed, 03 Mar 2010 20:50:59 GMT
Look at implementing your own Partitioner implementation to control which
records are sent to which reduce shards.

- Aaron

On Wed, Mar 3, 2010 at 12:15 PM, Gang Luo <lgpublic@yahoo.com.cn> wrote:

> Hi all,
> I want to generate some datasets with data skew to test my mapreduce jobs.
> I am using TPC-DS but it seems I cannot control the data skew level. There
> is a suite from Microsoft that could generate skewed datasets based on
> TPC-D, but only workable in windows. I haven't succeed make it compilable in
> linux yet. Please tell me how can I get some skewed dataset.
>
> Thanks.
> -Gang
>
>
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message