incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Uldis Barbans <>
Subject Fwd: Cassandra BulkOutputFormat with Hadoop MRv1
Date Mon, 12 Nov 2012 18:22:24 GMT

Is BulkOutputFormat intended to be compatible with MRv1 (mapred) at
all? I'm trying to write to Cassandra, roughly following the example
but with MRv1 - that is, calling output.collect(rowkey,
Collections.singletonList(mutation)); in my mapper-only job.

The job appears to succeed, but no data appears in Cassandra. A task
attempt's syslog:
2012-11-12 16:21:12,806 WARN mapreduce.Counters: Group
org.apache.hadoop.mapred.Task$Counter is deprecated. Use
org.apache.hadoop.mapreduce.TaskCounter instead
2012-11-12 16:21:12,875 INFO org.apache.hadoop.util.NativeCodeLoader:
Loaded the native-hadoop library
2012-11-12 16:21:12,993 WARN org.apache.hadoop.conf.Configuration: is deprecated. Instead, use dfs.metrics.session-id
2012-11-12 16:21:12,993 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
Initializing JVM Metrics with processName=MAP, sessionId=
2012-11-12 16:21:13,020 WARN org.apache.hadoop.conf.Configuration: is deprecated. Instead, use dfs.datanode.hostname
2012-11-12 16:21:13,304 INFO org.apache.hadoop.util.ProcessTree:
setsid exited with exit code 0
2012-11-12 16:21:13,307 INFO org.apache.hadoop.mapred.Task:  Using
ResourceCalculatorPlugin :
2012-11-12 16:21:14,481 WARN mapreduce.Counters: Counter name
MAP_INPUT_BYTES is deprecated. Use FileInputFormatCounters as group
name and  BYTES_READ as counter name instead
2012-11-12 16:21:14,486 INFO org.apache.hadoop.mapred.MapTask: numReduceTasks: 0
2012-11-12 16:21:15,176 INFO org.apache.cassandra.utils.CLibrary: JNA
not found. Native methods will be disabled.
2012-11-12 16:21:15,565 INFO Opening
/[anonymized]/keyspace/columnfamily/keyspace-columnfamily-hf-1 (36358
2012-11-12 16:21:15,570 INFO org.apache.hadoop.mapred.Task:
Task:attempt_201210082133_102094_m_000000_0 is done. And is in the
process of commiting
2012-11-12 16:21:15,724 INFO org.apache.hadoop.mapred.Task: Task
'attempt_201210082133_102094_m_000000_0' done.
2012-11-12 16:21:15,727 INFO
org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs'
truncater with mapRetainSize=-1 and reduceRetainSize=-1

So there is a sstable apparently opened successfully, and I'm
suspicious why the Task says it's done a decisecond later. It seems
the framework never waits for BulkOutputFormat's streaming to even

Any suggestions?


View raw message