cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex Liu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-5544) Hadoop jobs assigns only one mapper in task
Date Tue, 28 May 2013 17:04:22 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-5544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13668449#comment-13668449
] 

Alex Liu commented on CASSANDRA-5544:
-------------------------------------

[~shamim] I think you already found the answer, SET pig.noSplitCombination true, so Pig doesn't
combine the small splits into one mapper. HBase internal code does it as well. I found that
C*-1.2.1 update Pig from 0.9.0 version to 0.10.0 version which may cause the behavior changes.

As far as number 4) and number 5) concerns, I think the empty maps/big maps are due to data
skewness. If you can first print out the splits, then you can check the rows for each split.

I will add the following code to CassandraStorage.java

job.getConfiguration().setBoolean("pig.noSplitCombination", true);
                
> Hadoop jobs assigns only one mapper in task 
> --------------------------------------------
>
>                 Key: CASSANDRA-5544
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5544
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Hadoop
>    Affects Versions: 1.2.1
>         Environment: Red hat linux 5.4, Hadoop 1.0.3, pig 0.11.1
>            Reporter: Shamim Ahmed
>            Assignee: Alex Liu
>         Attachments: Screen Shot 2013-05-26 at 4.49.48 PM.png
>
>
> We have got very strange beheviour of hadoop cluster after upgrading 
> Cassandra from 1.1.5 to Cassandra 1.2.1. We have 5 nodes cluster of Cassandra, where
three of them are hodoop slaves. Now when we are submitting job through Pig script, only one
map assigns in task running on one of the hadoop slaves regardless of 
> volume of data (already tried with more than million rows).
> Configure of pig as follows:
> export PIG_HOME=/oracle/pig-0.10.0
> export PIG_CONF_DIR=${HADOOP_HOME}/conf
> export PIG_INITIAL_ADDRESS=192.168.157.103
> export PIG_RPC_PORT=9160
> export PIG_PARTITIONER=org.apache.cassandra.dht.Murmur3Partitioner
> Also we have these following properties in hadoop:
>  <property>
>  <name>mapred.tasktracker.map.tasks.maximum</name>
>  <value>10</value>
>  </property>
>  <property>
>  <name>mapred.map.tasks</name>
>  <value>4</value>
>  </property>

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message