Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 62763 invoked from network); 20 Apr 2010 00:42:02 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 20 Apr 2010 00:42:02 -0000 Received: (qmail 5988 invoked by uid 500); 20 Apr 2010 00:42:01 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 5968 invoked by uid 500); 20 Apr 2010 00:42:01 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 5960 invoked by uid 99); 20 Apr 2010 00:42:01 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 20 Apr 2010 00:42:01 +0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests=FREEMAIL_FROM,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of jesse.mcconnell@gmail.com designates 74.125.83.172 as permitted sender) Received: from [74.125.83.172] (HELO mail-pv0-f172.google.com) (74.125.83.172) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 20 Apr 2010 00:41:55 +0000 Received: by pvf33 with SMTP id 33so3572661pvf.31 for ; Mon, 19 Apr 2010 17:41:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:received:message-id:subject:from:to:content-type :content-transfer-encoding; bh=z7mUlHYMLnhobh7oHsrczHDnArArn1e02M47AThTNlM=; b=spf+MNHNzA4wZebTGmrOtL0V10WuS35hT4tt/NooBFgg1cVfdPUgtNycYd8MS69/Mx /4m0KJZQ/Gj58soxX9tpfcCi2ccbSf+FXpAjaIAZCskwC2T+oXZ0WKPftZ3vLOQqhXEb nZybO1WHJGcMepC9DHwULUBzsi/QhotFfB5R8= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; b=ceGWvwh3C6M58XdeCgdkvQW0VXnOT2Zg8rO/whbD5jbyMJs4TK6rJbgrVqVA4pGfHe VDkl6QdohVXRwUZcmtldBSmSmXHoGX7DlKmHdidE4F8FQkDZpeFJT2+3Y5C7G2PEEWa9 al+etV6hD6+h4H3b1VNAGcfreNTDL6DI5LQKA= MIME-Version: 1.0 Received: by 10.142.83.9 with HTTP; Mon, 19 Apr 2010 17:41:33 -0700 (PDT) In-Reply-To: References: <1271626639.295713261@192.168.2.228> <1271628078.92320234@192.168.2.231> Date: Mon, 19 Apr 2010 19:41:33 -0500 Received: by 10.142.202.7 with SMTP id z7mr2485463wff.267.1271724093683; Mon, 19 Apr 2010 17:41:33 -0700 (PDT) Message-ID: Subject: Re: Help with MapReduce From: Jesse McConnell To: user@cassandra.apache.org Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org err not count in your case, but same symptom, cassandra can't return the answer to your query in the configured rpctimeout time cheers, jesse -- jesse mcconnell jesse.mcconnell@gmail.com On Mon, Apr 19, 2010 at 19:40, Jesse McConnell wrote: > most likely means that the count() operation is taking too long for > the configured RPCTimeout > > counts get unreliable after a certain number of columns under a key in > my experience > > jesse > > -- > jesse mcconnell > jesse.mcconnell@gmail.com > > > > On Mon, Apr 19, 2010 at 19:12, Joost Ouwerkerk wro= te: >> I'm slowly getting somewhere with Cassandra... I have successfully impor= ted >> 1.5 million rows using MapReduce. =C2=A0This took about 8 minutes on an = 8-node >> cluster, which is comparable to the time it takes with HBase. >> Now I'm having trouble scanning this data. =C2=A0I've created a simple M= apReduce >> job that counts rows in my ColumnFamily. =C2=A0The Job fails with most t= asks >> throwing the following Exception. =C2=A0Anyone have any ideas what's goi= ng wrong? >> java.lang.RuntimeException: TimedOutException() >> >> =C2=A0 =C2=A0 =C2=A0 at >> org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.maybeIn= it(ColumnFamilyRecordReader.java:165) >> =C2=A0 =C2=A0 =C2=A0 at >> org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.compute= Next(ColumnFamilyRecordReader.java:215) >> =C2=A0 =C2=A0 =C2=A0 at >> org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.compute= Next(ColumnFamilyRecordReader.java:97) >> =C2=A0 =C2=A0 =C2=A0 at >> com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIter= ator.java:135) >> =C2=A0 =C2=A0 =C2=A0 at >> com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java= :130) >> =C2=A0 =C2=A0 =C2=A0 at >> org.apache.cassandra.hadoop.ColumnFamilyRecordReader.nextKeyValue(Column= FamilyRecordReader.java:91) >> =C2=A0 =C2=A0 =C2=A0 at >> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(Ma= pTask.java:423) >> =C2=A0 =C2=A0 =C2=A0 at org.apache.hadoop.mapreduce.MapContext.nextKeyVa= lue(MapContext.java:67) >> =C2=A0 =C2=A0 =C2=A0 at org.apache.hadoop.mapreduce.Mapper.run(Mapper.ja= va:143) >> =C2=A0 =C2=A0 =C2=A0 at org.apache.hadoop.mapred.MapTask.runNewMapper(Ma= pTask.java:583) >> =C2=A0 =C2=A0 =C2=A0 at org.apache.hadoop.mapred.MapTask.run(MapTask.jav= a:305) >> =C2=A0 =C2=A0 =C2=A0 at org.apache.hadoop.mapred.Child.main(Child.java:1= 70) >> Caused by: TimedOutException() >> =C2=A0 =C2=A0 =C2=A0 at >> org.apache.cassandra.thrift.Cassandra$get_range_slices_result.read(Cassa= ndra.java:11015) >> =C2=A0 =C2=A0 =C2=A0 at >> org.apache.cassandra.thrift.Cassandra$Client.recv_get_range_slices(Cassa= ndra.java:623) >> =C2=A0 =C2=A0 =C2=A0 at >> org.apache.cassandra.thrift.Cassandra$Client.get_range_slices(Cassandra.= java:597) >> =C2=A0 =C2=A0 =C2=A0 at >> org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.maybeIn= it(ColumnFamilyRecordReader.java:142) >> =C2=A0 =C2=A0 =C2=A0 ... 11 more >> >> On Sun, Apr 18, 2010 at 6:01 PM, Stu Hood wrote= : >>> >>> In 0.6.0 and trunk, it is located at >>> src/java/org/apache/cassandra/hadoop/ColumnFamilyInputFormat.java >>> >>> You might be using a pre-release version of 0.6 if you are seeing a fat >>> client based InputFormat. >>> >>> >>> -----Original Message----- >>> From: "Joost Ouwerkerk" >>> Sent: Sunday, April 18, 2010 4:53pm >>> To: user@cassandra.apache.org >>> Subject: Re: Help with MapReduce >>> >>> Where is the ColumnFamilyInputFormat that uses Thrift? =C2=A0I don't ac= tually >>> have a preference about client, I just want to be consistent with >>> ColumnInputFormat. >>> >>> On Sun, Apr 18, 2010 at 5:37 PM, Stu Hood wrot= e: >>> >>> > ColumnFamilyInputFormat no longer uses the fat client API, and instea= d >>> > uses >>> > Thrift. There are still some significant problems with the fat client= , >>> > so it >>> > shouldn't be used without a good understanding of those problems. >>> > >>> > If you still want to use it, check out contrib/bmt_example, but I'd >>> > recommend that you use thrift for now. >>> > >>> > -----Original Message----- >>> > From: "Joost Ouwerkerk" >>> > Sent: Sunday, April 18, 2010 2:59pm >>> > To: user@cassandra.apache.org >>> > Subject: Help with MapReduce >>> > >>> > I'm a Cassandra noob trying to validate Cassandra as a viable >>> > alternative >>> > to >>> > HBase (which we've been using for over a year) for our application. = =C2=A0So >>> > far, >>> > I've had no success getting Cassandra working with MapReduce. >>> > >>> > My first step is inserting data into Cassandra. =C2=A0I've created a = MapRed >>> > job >>> > based using the fat client API. =C2=A0I'm using the fat client (Stora= geProxy) >>> > because that's what ColumnFamilyInputFormat uses and I want to use th= e >>> > same >>> > API for both read and write jobs. >>> > >>> > When I call StorageProxy.mutate(), nothing happens. =C2=A0The job com= pletes >>> > as >>> > if >>> > it had done something, but in fact nothing has changed in the cluster= . >>> > =C2=A0When >>> > I call StorageProxy.mutateBlocking(), I get an IOException complainin= g >>> > that >>> > there is no connection to the cluster. =C2=A0I've concluded with the = debugger >>> > that StorageService is not connecting to the cluster, even though I'v= e >>> > specified the correct seed and ListenAddress (I've using the exact sa= me >>> > storage-conf.xml as the nodes in the cluster). >>> > >>> > I'm sure I'm missing something obvious in the configuration or my set= up, >>> > but >>> > since I'm new to Cassandra, I can't see what it is. >>> > >>> > Any help appreciated, >>> > Joost >>> > >>> > >>> > >>> >>> >> >> >