incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shamim <sre...@yandex.ru>
Subject Re: Cassandra + Hadoop - 2 Task attempts with million of rows
Date Mon, 22 Apr 2013 17:50:14 GMT
We are using Hadoop 1.0.3 and pig 0.11.1 version

-- 
Best regards
  Shamim A.

22.04.2013, 21:48, "Shamim" <srecon@yandex.ru>:
> Hello all,
>   recently we have upgrade our cluster (6 nodes) from cassandra version 1.1.6 to 1.2.1.
Our cluster is evenly partitioned (Murmur3Partitioner). We are using pig for parse and compute
aggregate data.
>
> When we submit job through pig, what i consistently see is that, while most of the task
have 20-25k row assigned each (Map input records), only 2 of them (always 2 ) getting more
than 2 million rows. This 2 tasks always complete 100% and hang for long time. Also most of
the time we are getting killed task (2%) with TimeoutException.
>
> We increased rpc_timeout to 60000, also set cassandra.input.split.size=1024 but nothing
help.
>
> We have roughly 97million rows in our cluster. Why we are getting above behavior? Do
you have any suggestion or clue to trouble shoot in this issue? Any help will be highly thankful.
Thankx in advance.
>
> --
> Best regards
>   Shamim A.

Mime
View raw message