cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ben Frank <...@airlust.com>
Subject Re: Cassandra Hadoop integration issue using CFIF
Date Wed, 29 Aug 2012 23:48:30 GMT
This line always returns "0" because the key ByteBuffer has already been
read from.

startToken
= partitioner.getTokenFactory().toString(partitioner.getToken(Iterables.getLast(rows).key));

I was able to get it to work by using .mark() and .reset() on the buffer.
I'll log a bug, but confused as to why no one else is running into this.

-Ben

On Wed, Aug 29, 2012 at 12:32 PM, Ben Frank <ben@airlust.com> wrote:

> Hey all,
>     I'm having an issue using ColumnFamilyInputFormat in an hadoop job.
> The mappers spin out of control and just keep reading records over and
> over, never getting to the end. I have CF with wide rows (although none is
> past about 5 at the columns at the moment), I've tried setting wide rows to
> both true and false. If I turn on debugging, I get what seems like strange
> input splits created (see the -1):
>
> hadoop.ColumnFamilyInputFormat: partitioner is
> org.apache.cassandra.dht.RandomPartitioner@203727c5
> hadoop.ColumnFamilyInputFormat: adding
> ColumnFamilySplit((127605887595351923798765477786913079296, '-1] @[cass1,
> cass2, cass3])
> hadoop.ColumnFamilyInputFormat: adding ColumnFamilySplit((-1, '0] @[cass1,
> cass2, cass3])
> hadoop.ColumnFamilyInputFormat: adding ColumnFamilySplit((0,
> '42535295865117307932921825928971026432] @[cass2, cass3, cass4])
> hadoop.ColumnFamilyInputFormat: adding
> ColumnFamilySplit((42535295865117307932921825928971026432,
> '85070591730234615865843651857942052864] @[cass3, cass4, cass1])
> hadoop.ColumnFamilyInputFormat: adding
> ColumnFamilySplit((85070591730234615865843651857942052864,
> '127605887595351923798765477786913079296] @[cass4, cass1, cass2])
>
> If I debug in eclipse (with widerows=false) is see that this call in
> ColumnFamilyRecordReader.StaticRowIterator.maybeInit() is setting
> startToken to -1:
>
> startToken = partitioner.getTokenFactory().toString(partitioner
> .getToken(Iterables.getLast(rows).key));
>
> I'm using cassandra 1.1.2 with a 4 node cluster, a replication factor of 3
> and hadoop 0.20.1, here's the output of nodetool ring:
>
> Address         DC          Rack        Status State   Load
>  Effective-Ownership Token
>
>
>                  127605887595351923798765477786913079296
>
> 129.19.63.126   datacenter1 rack1       Up     Normal  46.91 GB
>  75.00%              0
>
> 129.19.63.127   datacenter1 rack1       Up     Normal  49.45 GB
>  75.00%              42535295865117307932921825928971026432
>
> 129.19.63.128   datacenter1 rack1       Up     Normal  43.19 GB
>  75.00%              85070591730234615865843651857942052864
>
> 129.19.63.129   datacenter1 rack1       Up     Normal  46.9 GB
> 75.00%              127605887595351923798765477786913079296
>
> Anyone have any idea what's going on here, I'm assuming the splits are
> wrong so I'm going to focus on seeing what's up with that, anything else I
> should look at ?
>
> -Ben
>

Mime
View raw message