cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ben Frank <...@airlust.com>
Subject Cassandra Hadoop integration issue using CFIF
Date Wed, 29 Aug 2012 19:32:02 GMT
Hey all,
    I'm having an issue using ColumnFamilyInputFormat in an hadoop job. The
mappers spin out of control and just keep reading records over and over,
never getting to the end. I have CF with wide rows (although none is past
about 5 at the columns at the moment), I've tried setting wide rows to both
true and false. If I turn on debugging, I get what seems like strange input
splits created (see the -1):

hadoop.ColumnFamilyInputFormat: partitioner is
org.apache.cassandra.dht.RandomPartitioner@203727c5
hadoop.ColumnFamilyInputFormat: adding
ColumnFamilySplit((127605887595351923798765477786913079296, '-1] @[cass1,
cass2, cass3])
hadoop.ColumnFamilyInputFormat: adding ColumnFamilySplit((-1, '0] @[cass1,
cass2, cass3])
hadoop.ColumnFamilyInputFormat: adding ColumnFamilySplit((0,
'42535295865117307932921825928971026432] @[cass2, cass3, cass4])
hadoop.ColumnFamilyInputFormat: adding
ColumnFamilySplit((42535295865117307932921825928971026432,
'85070591730234615865843651857942052864] @[cass3, cass4, cass1])
hadoop.ColumnFamilyInputFormat: adding
ColumnFamilySplit((85070591730234615865843651857942052864,
'127605887595351923798765477786913079296] @[cass4, cass1, cass2])

If I debug in eclipse (with widerows=false) is see that this call in
ColumnFamilyRecordReader.StaticRowIterator.maybeInit() is setting
startToken to -1:

startToken = partitioner.getTokenFactory().toString(partitioner
.getToken(Iterables.getLast(rows).key));

I'm using cassandra 1.1.2 with a 4 node cluster, a replication factor of 3
and hadoop 0.20.1, here's the output of nodetool ring:

Address         DC          Rack        Status State   Load
 Effective-Ownership Token


               127605887595351923798765477786913079296

129.19.63.126   datacenter1 rack1       Up     Normal  46.91 GB
 75.00%              0

129.19.63.127   datacenter1 rack1       Up     Normal  49.45 GB
 75.00%              42535295865117307932921825928971026432

129.19.63.128   datacenter1 rack1       Up     Normal  43.19 GB
 75.00%              85070591730234615865843651857942052864

129.19.63.129   datacenter1 rack1       Up     Normal  46.9 GB
75.00%              127605887595351923798765477786913079296

Anyone have any idea what's going on here, I'm assuming the splits are
wrong so I'm going to focus on seeing what's up with that, anything else I
should look at ?

-Ben

Mime
View raw message