Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 62188 invoked from network); 20 Apr 2010 00:40:46 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 20 Apr 2010 00:40:46 -0000 Received: (qmail 4966 invoked by uid 500); 20 Apr 2010 00:40:45 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 4947 invoked by uid 500); 20 Apr 2010 00:40:45 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 4939 invoked by uid 99); 20 Apr 2010 00:40:45 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 20 Apr 2010 00:40:45 +0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of jesse.mcconnell@gmail.com designates 209.85.160.44 as permitted sender) Received: from [209.85.160.44] (HELO mail-pw0-f44.google.com) (209.85.160.44) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 20 Apr 2010 00:40:39 +0000 Received: by pwj2 with SMTP id 2so3855930pwj.31 for ; Mon, 19 Apr 2010 17:40:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:received:message-id:subject:from:to:content-type :content-transfer-encoding; bh=k0gODR5FEzdcXeqjD+qIZViYrw9hw9j7iW++PdVhKow=; b=lF3S0vtz7ZLy9QEdINrVeMLFdGma+64F87b5UPVBmfyAYJS6c+V3VKRKxdp/Dh/h0z sQAFDvlwpjcVOgalnPzVXrUL0sFoTYpqwYv0JMUSMKJmElZ33D6VKJiuN2NYI8OC0tqj jhQu8+5rGBzTgytzrS2dv5kO3TCZvKfGrK5xw= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; b=xX4RozFHiMdSPlnoETFje6WmEuwMX0o/zcmngrbzR2ECPsptrrV/z/E3YDai0jXgVL w8pOXkz+et/Mvphl8VGG2Wc7RNB9xAHxS/G2vLPkvf6hAeL3N4JdPvnvaF8jEl+bIzC7 4C1XOyNfG/El4hgWsFvf97FKhndKMQpHG8wEk= MIME-Version: 1.0 Received: by 10.142.83.9 with HTTP; Mon, 19 Apr 2010 17:40:17 -0700 (PDT) In-Reply-To: References: <1271626639.295713261@192.168.2.228> <1271628078.92320234@192.168.2.231> Date: Mon, 19 Apr 2010 19:40:17 -0500 Received: by 10.143.84.5 with SMTP id m5mr2440390wfl.313.1271724017856; Mon, 19 Apr 2010 17:40:17 -0700 (PDT) Message-ID: Subject: Re: Help with MapReduce From: Jesse McConnell To: user@cassandra.apache.org Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org most likely means that the count() operation is taking too long for the configured RPCTimeout counts get unreliable after a certain number of columns under a key in my experience jesse -- jesse mcconnell jesse.mcconnell@gmail.com On Mon, Apr 19, 2010 at 19:12, Joost Ouwerkerk wrote= : > I'm slowly getting somewhere with Cassandra... I have successfully import= ed > 1.5 million rows using MapReduce. =C2=A0This took about 8 minutes on an 8= -node > cluster, which is comparable to the time it takes with HBase. > Now I'm having trouble scanning this data. =C2=A0I've created a simple Ma= pReduce > job that counts rows in my ColumnFamily. =C2=A0The Job fails with most ta= sks > throwing the following Exception. =C2=A0Anyone have any ideas what's goin= g wrong? > java.lang.RuntimeException: TimedOutException() > > at > org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.maybeIni= t(ColumnFamilyRecordReader.java:165) > at > org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.computeN= ext(ColumnFamilyRecordReader.java:215) > at > org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.computeN= ext(ColumnFamilyRecordReader.java:97) > at > com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractItera= tor.java:135) > at > com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:= 130) > at > org.apache.cassandra.hadoop.ColumnFamilyRecordReader.nextKeyValue(ColumnF= amilyRecordReader.java:91) > at > org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(Map= Task.java:423) > at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:6= 7) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:583) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) > at org.apache.hadoop.mapred.Child.main(Child.java:170) > Caused by: TimedOutException() > at > org.apache.cassandra.thrift.Cassandra$get_range_slices_result.read(Cassan= dra.java:11015) > at > org.apache.cassandra.thrift.Cassandra$Client.recv_get_range_slices(Cassan= dra.java:623) > at > org.apache.cassandra.thrift.Cassandra$Client.get_range_slices(Cassandra.j= ava:597) > at > org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.maybeIni= t(ColumnFamilyRecordReader.java:142) > ... 11 more > > On Sun, Apr 18, 2010 at 6:01 PM, Stu Hood wrote: >> >> In 0.6.0 and trunk, it is located at >> src/java/org/apache/cassandra/hadoop/ColumnFamilyInputFormat.java >> >> You might be using a pre-release version of 0.6 if you are seeing a fat >> client based InputFormat. >> >> >> -----Original Message----- >> From: "Joost Ouwerkerk" >> Sent: Sunday, April 18, 2010 4:53pm >> To: user@cassandra.apache.org >> Subject: Re: Help with MapReduce >> >> Where is the ColumnFamilyInputFormat that uses Thrift? =C2=A0I don't act= ually >> have a preference about client, I just want to be consistent with >> ColumnInputFormat. >> >> On Sun, Apr 18, 2010 at 5:37 PM, Stu Hood wrote= : >> >> > ColumnFamilyInputFormat no longer uses the fat client API, and instead >> > uses >> > Thrift. There are still some significant problems with the fat client, >> > so it >> > shouldn't be used without a good understanding of those problems. >> > >> > If you still want to use it, check out contrib/bmt_example, but I'd >> > recommend that you use thrift for now. >> > >> > -----Original Message----- >> > From: "Joost Ouwerkerk" >> > Sent: Sunday, April 18, 2010 2:59pm >> > To: user@cassandra.apache.org >> > Subject: Help with MapReduce >> > >> > I'm a Cassandra noob trying to validate Cassandra as a viable >> > alternative >> > to >> > HBase (which we've been using for over a year) for our application. = =C2=A0So >> > far, >> > I've had no success getting Cassandra working with MapReduce. >> > >> > My first step is inserting data into Cassandra. =C2=A0I've created a M= apRed >> > job >> > based using the fat client API. =C2=A0I'm using the fat client (Storag= eProxy) >> > because that's what ColumnFamilyInputFormat uses and I want to use the >> > same >> > API for both read and write jobs. >> > >> > When I call StorageProxy.mutate(), nothing happens. =C2=A0The job comp= letes >> > as >> > if >> > it had done something, but in fact nothing has changed in the cluster. >> > =C2=A0When >> > I call StorageProxy.mutateBlocking(), I get an IOException complaining >> > that >> > there is no connection to the cluster. =C2=A0I've concluded with the d= ebugger >> > that StorageService is not connecting to the cluster, even though I've >> > specified the correct seed and ListenAddress (I've using the exact sam= e >> > storage-conf.xml as the nodes in the cluster). >> > >> > I'm sure I'm missing something obvious in the configuration or my setu= p, >> > but >> > since I'm new to Cassandra, I can't see what it is. >> > >> > Any help appreciated, >> > Joost >> > >> > >> > >> >> > >