cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Artem Aliev (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CASSANDRA-8471) mapred/hive queries fail when there is just 1 node down RF is > 1
Date Fri, 12 Dec 2014 17:12:13 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-8471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14244458#comment-14244458
] 

Artem Aliev edited comment on CASSANDRA-8471 at 12/12/14 5:11 PM:
------------------------------------------------------------------

The CqlRecordReader is used to read data from C* to map tasks. To connect to C* it receive
a list of C* node locations where given split(row) can be found. It suppose to check all that
connections to find available nodes for "control connect". But because the connect methods
was out of the check loop, the fist node in the list  is always selected. If it is unavailable
the map task failed with above Exception.
I just moved cluster.connect() call into the check loop. 





was (Author: artem.aliev):
The CqlRecordReader is used to read data from C* to map tasks. To connect to C* it receive
a list of C* node locations. It suppose to check all that connections to find available nodes
for "control connect". But because the connect methods was out of the check loop, the fist
node in the list  is always selected. If it is unavailable the map task failed with above
Exception.
I just moved cluster.connect() call into the check loop. 




> mapred/hive queries fail when there is just 1 node down RF is > 1
> -----------------------------------------------------------------
>
>                 Key: CASSANDRA-8471
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8471
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Hadoop
>            Reporter: Artem Aliev
>              Labels: easyfix, hadoop, patch
>             Fix For: 2.0.12, 2.1.3
>
>         Attachments: cassandra-2.0-8471.txt
>
>
> The hive and map reduce queries fail when just 1 node is down, even with RF=3 (in a 6
node cluster) and default consistency levels for Read and Write.
> The simpliest way to reproduce it is to use DataStax integrated hadoop environment with
hive.
> {quote}
> alter keyspace "HiveMetaStore" WITH replication = {'class':'NetworkTopologyStrategy',
'DC1':3} ;
> alter keyspace cfs WITH replication = {'class':'NetworkTopologyStrategy', 'DC1':3} ;
> alter keyspace cfs_archive WITH replication = {'class':'NetworkTopologyStrategy', 'DC1':3}
;
> CREATE KEYSPACE datamart WITH replication = {
>   'class': 'NetworkTopologyStrategy',
>   'DC1': '3'
> };
> CREATE TABLE users1 (
>   id int,
>   name text,
>   PRIMARY KEY ((id))
> )
> {quote}
> Insert data.
> Shutdown one cassandra node.
> Run map reduce task. Hive in this case
> {quote}
> $ dse hive
> hive> use datamart;
> hive> select count(*) from users1;
> {quote}
> {quote}
> ...
> ...
> 2014-12-10 18:33:53,090 Stage-1 map = 75%,  reduce = 25%, Cumulative CPU 6.39 sec
> 2014-12-10 18:33:54,093 Stage-1 map = 75%,  reduce = 25%, Cumulative CPU 6.39 sec
> 2014-12-10 18:33:55,096 Stage-1 map = 75%,  reduce = 25%, Cumulative CPU 6.39 sec
> 2014-12-10 18:33:56,099 Stage-1 map = 75%,  reduce = 25%, Cumulative CPU 6.39 sec
> 2014-12-10 18:33:57,102 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 6.39 sec
> MapReduce Total cumulative CPU time: 6 seconds 390 msec
> Ended Job = job_201412100017_0006 with errors
> Error during job, obtaining debugging information...
> Job Tracking URL: http://i-9d0306706.c.eng-gce-support.internal:50030/jobdetails.jsp?jobid=job_201412100017_0006
> Examining task ID: task_201412100017_0006_m_000005 (and more) from job job_201412100017_0006
> Task with the most failures(4):
> -----
> Task ID:
>   task_201412100017_0006_m_000001
> URL:
>   http://i-9d0306706.c.eng-gce-support.internal:50030/taskdetails.jsp?jobid=job_201412100017_0006&tipid=task_201412100017_0006_m_000001
> -----
> Diagnostic Messages for this Task:
> java.io.IOException: java.io.IOException: com.datastax.driver.core.exceptions.NoHostAvailableException:
All host(s) tried for query failed (tried: i-6ac985f7d.c.eng-gce-support.internal/10.240.124.16:9042
(com.datastax.driver.core.TransportException: [i-6ac985f7d.c.eng-gce-support.internal/10.240.124.16:9042]
Cannot connect))
> 	at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)
> 	at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57)
> 	at org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:244)
> 	at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:538)
> 	at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.<init>(MapTask.java:197)
> 	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:418)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
> 	at org.apache.hadoop.mapred.Child$4.run(Child.java:266)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:415)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
> 	at org.apache.hadoop.mapred.Child.main(Child.java:260)
> Caused by: java.io.IOException: com.datastax.driver.core.exceptions.NoHostAvailableException:
All host(s) tried for query failed (tried: i-6ac985f7d.c.eng-gce-support.internal/10.240.124.16:9042
(com.datastax.driver.core.TransportException: [i-6ac985f7d.c.eng-gce-support.internal/10.240.124.16:9042]
Cannot connect))
> 	at org.apache.hadoop.hive.cassandra.cql3.input.HiveCqlInputFormat.getRecordReader(HiveCqlInputFormat.java:206)
> 	at org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:241)
> 	... 9 more
> Caused by: com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s)
tried for query failed (tried: i-6ac985f7d.c.eng-gce-support.internal/10.240.124.16:9042 (com.datastax.driver.core.TransportException:
[i-6ac985f7d.c.eng-gce-support.internal/10.240.124.16:9042] Cannot connect))
> 	at com.datastax.driver.core.ControlConnection.reconnectInternal(ControlConnection.java:196)
> 	at com.datastax.driver.core.ControlConnection.connect(ControlConnection.java:79)
> 	at com.datastax.driver.core.Cluster$Manager.init(Cluster.java:1104)
> 	at com.datastax.driver.core.Cluster.init(Cluster.java:121)
> 	at com.datastax.driver.core.Cluster.connect(Cluster.java:198)
> 	at com.datastax.driver.core.Cluster.connect(Cluster.java:226)
> 	at org.apache.cassandra.hadoop.cql3.CqlRecordReader.initialize(CqlRecordReader.java:127)
> 	at org.apache.hadoop.hive.cassandra.cql3.input.CqlHiveRecordReader.initialize(CqlHiveRecordReader.java:94)
> 	at org.apache.hadoop.hive.cassandra.cql3.input.HiveCqlInputFormat.getRecordReader(HiveCqlInputFormat.java:201)
> 	... 10 more
> FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
> MapReduce Jobs Launched:
> Job 0: Map: 4  Reduce: 1   Cumulative CPU: 6.39 sec   HDFS Read: 0 HDFS Write: 0 FAIL
> Total MapReduce CPU Time Spent: 6 seconds 390 msec
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message