incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Ellis <jbel...@gmail.com>
Subject Re: Feeding in specific Cassandra columns into Hadoop
Date Tue, 04 May 2010 04:27:08 GMT
We serialize the SlicePredicate as part of the Hadoop Configuration
string.  It's quite possible that either

 - one of your column names is exposing a bug in the Thrift json serializer
 - Hadoop is silently truncating large predicates

You should test that getSlicePredicate(conf).equals(originalPredicate)

On Mon, May 3, 2010 at 8:15 PM, Mark Schnitzius
<mark.schnitzius@cxense.com> wrote:
> If I take the exact same SlicePredicate that fails in the Hadoop example,
> and pass it in to a multiget_slice, the data is returned successfully.  So
> it appears the problem does lie somewhere in the tie-in to Hadoop.
> I will try to create a maximally-trimmed-down example that's complete enough
> to run on its own that demonstrates the failure, and will post here.  I was
> just hoping that there might've been an easy fix recognizable from my
> description before I had to resort to that...
>
> Thanks
> Mark
>
>
> On Tue, May 4, 2010 at 1:40 AM, Jonathan Ellis <jbellis@gmail.com> wrote:
>>
>> Can you reproduce outside the Hadoop environment, i.e. w/ Thrift code?
>>
>> On Mon, May 3, 2010 at 5:49 AM, Mark Schnitzius
>> <mark.schnitzius@cxense.com> wrote:
>> > Hi all...  I am trying to feed a specific list of Cassandra column names
>> > in
>> > as input to a Hadoop process, but for some reason it only feeds in some
>> > of
>> > the columns I specify, not all.
>> > This is a short description of the problem - I'll see if anyone might
>> > have
>> > some insight before I dump a big load of code on you...
>> > 1.  I've uploaded a bunch of data into Cassandra; the column names as
>> > longs
>> > (dates, basically) converted to byte[8].
>> > 2.  I can successfully set a SlicePredicate using setSlice_range to
>> > return
>> > all the data for a set of columns.
>> > 3.  However, if I instead call setColumn_names on the SlicePredicate,
>> > only
>> > some of the specified columns get fed into Hadoop.
>> > 4.  This faulty behavior is repeatable, with the same columns going
>> > missing
>> > each time for the same input parameters.
>> > 5.  For the values that fail, I've made fairly certain that the value
>> > for
>> > the column name is getting inserted successfully, and that the exact
>> > same
>> > column name is specified in the call to setColumn_names.
>> > Any clues?
>> >
>> > AdTHANKSvance,
>> > Mark
>>
>>
>>
>> --
>> Jonathan Ellis
>> Project Chair, Apache Cassandra
>> co-founder of Riptano, the source for professional Cassandra support
>> http://riptano.com
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com

Mime
View raw message