If I take the exact same SlicePredicate that fails in the Hadoop example, and pass it in to a multiget_slice, the data is returned successfully.  So it appears the problem does lie somewhere in the tie-in to Hadoop.

I will try to create a maximally-trimmed-down example that's complete enough to run on its own that demonstrates the failure, and will post here.  I was just hoping that there might've been an easy fix recognizable from my description before I had to resort to that...


On Tue, May 4, 2010 at 1:40 AM, Jonathan Ellis <jbellis@gmail.com> wrote:
Can you reproduce outside the Hadoop environment, i.e. w/ Thrift code?

On Mon, May 3, 2010 at 5:49 AM, Mark Schnitzius
<mark.schnitzius@cxense.com> wrote:
> Hi all...  I am trying to feed a specific list of Cassandra column names in
> as input to a Hadoop process, but for some reason it only feeds in some of
> the columns I specify, not all.
> This is a short description of the problem - I'll see if anyone might have
> some insight before I dump a big load of code on you...
> 1.  I've uploaded a bunch of data into Cassandra; the column names as longs
> (dates, basically) converted to byte[8].
> 2.  I can successfully set a SlicePredicate using setSlice_range to return
> all the data for a set of columns.
> 3.  However, if I instead call setColumn_names on the SlicePredicate, only
> some of the specified columns get fed into Hadoop.
> 4.  This faulty behavior is repeatable, with the same columns going missing
> each time for the same input parameters.
> 5.  For the values that fail, I've made fairly certain that the value for
> the column name is getting inserted successfully, and that the exact same
> column name is specified in the call to setColumn_names.
> Any clues?
> AdTHANKSvance,
> Mark

Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support