incubator-cassandra-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeremy Hanna <>
Subject Re: range ghosts and more with hadoop support (with proposed solution)
Date Mon, 04 Jul 2011 15:55:06 GMT

On Jul 1, 2011, at 9:09 PM, Jeremy Hanna wrote:

> We think we're running into a situation where we've deleted all the columns on several
thousand rows but they still show up in the results of our pig scripts.  We think that's a
product of range ghosts because ColumnFamilyRecordReader uses getRangeSlices.  So that might
be a problem for people and I think we have something that might address that.
> What if we were to have a hadoop job specific option to have the CFRR filter out rows
returned that don't contain any columns?  It's true that it used to do that in core Cassandra
and was removed as a feature because of the performance penalty.  However with hadoop type
loads, latency isn't as big of a deal.  That and it could be a job specific option.  Also,
for CFRR there's the option for a SlicePredicate.  In addition to being able to suppress range
ghosts, it could also skip rows that had no data for that SlicePredicate, which would also
be a nice feature - since it might have similar undesirable consequences.  True the person
doing the MapReduce job or the pig script or whatever could deal with it at that level.  However,
this is core enough and could could be optional so that people wouldn't have to do checking
all over the place for keys without any columns.
> Would such an option be okay to add to the hadoop config and to the CFRR?
> Jeremy

View raw message