Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 92510 invoked from network); 4 May 2010 04:27:56 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 4 May 2010 04:27:56 -0000 Received: (qmail 91012 invoked by uid 500); 4 May 2010 04:27:55 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 90954 invoked by uid 500); 4 May 2010 04:27:54 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 90946 invoked by uid 99); 4 May 2010 04:27:54 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 04 May 2010 04:27:54 +0000 X-ASF-Spam-Status: No, hits=-0.2 required=10.0 tests=AWL,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of jbellis@gmail.com designates 74.125.82.172 as permitted sender) Received: from [74.125.82.172] (HELO mail-wy0-f172.google.com) (74.125.82.172) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 04 May 2010 04:27:49 +0000 Received: by wyb32 with SMTP id 32so269689wyb.31 for ; Mon, 03 May 2010 21:27:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:mime-version:received:in-reply-to :references:from:date:message-id:subject:to:content-type :content-transfer-encoding; bh=5mt5GD9LnsoMbtpjEH4WdS5E6Q/sfFQG0aWe3X/C1Xs=; b=cYkJW53E1npi40+WuI9B3+OCMJCDz0YufCNwbXg4Xln0wG8ziR+lJWvjM710J1Irjf yJGGVxrsz3dixRMZrPKDKz0yVuG/fjZUZG3zvBZexr0/q7UWnHbhBtGqLl0wh98yYytZ Qf7rkwgFb0tT2n/2fwE9uugGjnMmheB1HzJiQ= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type:content-transfer-encoding; b=opdqGDkIBdaEbt7Y+hzhVGaMSXZguSFV/Q+qX/V7roQ2lGJ9oc7+JLjqyAKQXKcG0w dJDO2SnXkUwNYJvjmRoc/D2oAv7frPVXs7wbeYitXzDk0OVXtl3sUMyW/9DJcAJ9L2aI xRxOLoCoT4dTwKaRP3GJyBT/Gr2HIpYHgUpe4= Received: by 10.216.91.16 with SMTP id g16mr6099681wef.102.1272947248118; Mon, 03 May 2010 21:27:28 -0700 (PDT) MIME-Version: 1.0 Received: by 10.216.22.10 with HTTP; Mon, 3 May 2010 21:27:08 -0700 (PDT) In-Reply-To: References: From: Jonathan Ellis Date: Mon, 3 May 2010 23:27:08 -0500 Message-ID: Subject: Re: Feeding in specific Cassandra columns into Hadoop To: user@cassandra.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable We serialize the SlicePredicate as part of the Hadoop Configuration string. It's quite possible that either - one of your column names is exposing a bug in the Thrift json serializer - Hadoop is silently truncating large predicates You should test that getSlicePredicate(conf).equals(originalPredicate) On Mon, May 3, 2010 at 8:15 PM, Mark Schnitzius wrote: > If I take the exact same SlicePredicate that fails in the Hadoop example, > and pass it in to a=A0multiget_slice, the data is returned successfully. = =A0So > it appears the problem does lie somewhere in the tie-in to Hadoop. > I will try to create a maximally-trimmed-down example that's complete eno= ugh > to run on its own that demonstrates the failure, and will post here. =A0I= was > just hoping that there might've been an easy fix recognizable from my > description before I had to resort to that... > > Thanks > Mark > > > On Tue, May 4, 2010 at 1:40 AM, Jonathan Ellis wrote: >> >> Can you reproduce outside the Hadoop environment, i.e. w/ Thrift code? >> >> On Mon, May 3, 2010 at 5:49 AM, Mark Schnitzius >> wrote: >> > Hi all... =A0I am trying to feed a specific list of Cassandra column n= ames >> > in >> > as input to a Hadoop process, but for some reason it only feeds in som= e >> > of >> > the columns I specify, not all. >> > This is a short description of the problem - I'll see if anyone might >> > have >> > some insight before I dump a big load of code on you... >> > 1. =A0I've uploaded a bunch of data into Cassandra; the column names a= s >> > longs >> > (dates, basically) converted to byte[8]. >> > 2. =A0I can successfully set a SlicePredicate using setSlice_range to >> > return >> > all the data for a set of columns. >> > 3. =A0However, if I instead call setColumn_names on the SlicePredicate= , >> > only >> > some of the specified columns get fed into Hadoop. >> > 4. =A0This faulty behavior is repeatable, with the same columns going >> > missing >> > each time for the same input parameters. >> > 5. =A0For the values that fail, I've made fairly certain that the valu= e >> > for >> > the column name is getting inserted successfully, and that the exact >> > same >> > column name is specified in the call to setColumn_names. >> > Any clues? >> > >> > AdTHANKSvance, >> > Mark >> >> >> >> -- >> Jonathan Ellis >> Project Chair, Apache Cassandra >> co-founder of Riptano, the source for professional Cassandra support >> http://riptano.com > > --=20 Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com