From user-return-30492-apmail-cassandra-user-archive=cassandra.apache.org@cassandra.apache.org Thu Dec 6 15:29:58 2012 Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 84284DFB7 for ; Thu, 6 Dec 2012 15:29:58 +0000 (UTC) Received: (qmail 47684 invoked by uid 500); 6 Dec 2012 15:29:56 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 47283 invoked by uid 500); 6 Dec 2012 15:29:55 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 47268 invoked by uid 99); 6 Dec 2012 15:29:55 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 06 Dec 2012 15:29:55 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of matganine@hotmail.com designates 65.55.116.101 as permitted sender) Received: from [65.55.116.101] (HELO blu0-omc3-s26.blu0.hotmail.com) (65.55.116.101) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 06 Dec 2012 15:29:44 +0000 Received: from BLU170-W52 ([65.55.116.73]) by blu0-omc3-s26.blu0.hotmail.com with Microsoft SMTPSVC(6.0.3790.4675); Thu, 6 Dec 2012 07:29:23 -0800 X-Originating-IP: [82.150.248.29] X-EIP: [U380Z/a7kBwkT4NegdILPyEDm/JQY5z2] X-Originating-Email: [matganine@hotmail.com] Message-ID: Content-Type: multipart/alternative; boundary="_e406450e-0413-4bee-b5fc-4e8d2aaa86b4_" From: Ralph Romanos To: Subject: Slow Reads in Cassandra with Hadoop Date: Thu, 6 Dec 2012 15:29:23 +0000 Importance: Normal MIME-Version: 1.0 X-OriginalArrivalTime: 06 Dec 2012 15:29:23.0698 (UTC) FILETIME=[782F8D20:01CDD3C6] X-Virus-Checked: Checked by ClamAV on apache.org --_e406450e-0413-4bee-b5fc-4e8d2aaa86b4_ Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Hello Cassandra users=2C I am trying to read and process data in Cassandra using Hadoop. I have a 4-= node Cassandra cluster=2C and an 8-node Hadoop cluster:- 1 Namenode/Jobtrac= ker- 7 Datanodes/Tasktrackers (4 of them are also hosting Cassandra) I am using Cassandra 1.2 beta=2C Hadoop 0.20.2=2C java 1.6_u_34=2C 7 of my = nodes are on SLES 10 (Linux kernel: 2.6.16.60-0.76.8-smp) and the last one = is on SLES 11 (Linux kernel: 2.6.32.12-0.7-default). They are all 24 cores = with 33 GB ram=2C but for some reasons=2C the node running on SLES 11 is ru= nning Hadoop jobs significantly faster then the others (two to three times = faster)=3B any explanation for this is welcome as well. In my Hadoop job=2C I am using ColumnFamilyInputFormat and ColumnFamilyOutp= utFormat.Here is my mapper: Mapper=2C Text=2C Text>=2Cand my reducer: Reducer>. The input of my mapper is the values of the columns given in input. In outp= ut of my map=2C I write those values in the Text format separated by comas.= I ran the task on about 400 million rows in my database so the map functio= n is called one time for each row. When I run the job with 6 concurrent map= tasks on each server and 7 Hadoop servers=2C the job takes about an hour a= nd a half (the reduce step is done in about 5 seconds=2C so the problem is = in map task)=2C which is too long... So I set some timers between each call to the map function=2C and here is w= hat I get: After mapping about 4150 - 4160 rows (each row has 8 columns and values are= strings or long) in Cassandra in 60 ms approximately=2C there is a gap in = time.This gap is not the same for all the machines:- it is 200 ms on the no= de Cassandra + Hadoop that is running on SLES 11 (Cassandra is using 400% c= pu on this node)- it is 4200 ms on the 3 nodes that are hadoop only- it is = 900 ms on two nodes that are Cassandra + Hadoop and running on SLES 10 (Cas= sandra is using 400% cpu on this node)- it is 4200 ms on the last Cassandra= + Hadoop node (Cassandra is using 2300% cpu on this node and I get a lot o= f Garbage collection messages in the cassandra logs of this node only) When I run only 1 concurrent map task per node (instead of 6 above)=2C I ge= t the following results:- it is 200 ms on the node Cassandra + Hadoop that = is running on SLES 11 (Cassandra is using 150% cpu on this node)- it is 600= ms on the 3 nodes that are hadoop only- it is 600 ms on two nodes that are= Cassandra + Hadoop and running on SLES 10 (Cassandra is using 150% cpu on = this node)- it is 600 ms on the last Cassandra + Hadoop node (Cassandra is = using 400% cpu on this node and I don't get Garbage collection messages any= more in the cassandra logs) I do not really know what is happening during this gap=3B my guess would be= that Hadoop is reading data in Cassandra=2C streaming it to the Hadoop nod= es and finally writing it to the Hadoop Distributed File System.Does anyone= understand how reads are done when using Hadoop and Cassandra? and what is= exactly happening during this gap in time? and why there is such a differe= nce in time between nodes running on SLES10 and the node running on SLES 11= ?Why does it seem like this gap in time is smaller on nodes running Cassand= ra + Hadoop? Finally=2C does anyone know why this gap in time occurs after approximately= 4160 rows which represent about 32 KB in my case? Is there any parameter I= am not aware of to change this? Thanks in advance=2CRalph = --_e406450e-0413-4bee-b5fc-4e8d2aaa86b4_ Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable
Hello Cassandra users=2C

I am trying to read and process= data in Cassandra using Hadoop. I have a 4-node Cassandra cluster=2C and a= n 8-node Hadoop cluster:
- 1 Namenode/Jobtracker
- 7 Da= tanodes/Tasktrackers (4 of them are also hosting Cassandra)

<= /div>
I am using Cassandra 1.2 beta=2C Hadoop 0.20.2=2C java 1.6_u_34= =2C 7 of my nodes are on SLES 10 (Linux kernel: 2.6.16.60-0.76.8-smp) and t= he last one is on SLES 11 =3B= (Linux kernel: 2.6.32.12-0.7-default). They are all 24 cores with 33 GB r= am=2C but for some reasons=2C =3Bthe node running on SLES 11 is running Hadoop jobs significantly faste= r then the others (two to three times faster)=3B any explanation for this i= s welcome as well.

In my Hadoop job=2C I am= using ColumnFamilyInputFormat and ColumnFamilyOutputFormat.
Here= is my mapper: =3BMapper<=3BByteBuffer=2C SortedMap<=3BByteBuffer= =2C IColumn>=3B=2C Text=2C Text>=3B=2C
and my reducer: = =3BReducer<=3BText=2C Text=2C ByteBuffer=2C List<=3BMutation>=3B>= =3B.

The input of my mapper is the values of the c= olumns given in input. In output of my map=2C I write those values in the T= ext format separated by comas. I ran the task on about 400 million rows in = my database so the map function is called =3Bone time for each row. When I run the job with 6 concurrent map ta= sks on each server and 7 Hadoop servers=2C the job takes about an hour and = a half (the reduce step is done in about 5 seconds=2C so the problem is in = map task)=2C which is too long...

So I set = some timers between each call to the map function=2C and here is what I get= :

After mapping about 4150 - 4160 rows (each row h= as 8 columns and values are strings or long) in Cassandra in 60 ms approxim= ately=2C there is a gap in time.
This gap is not the same for all= the machines:
- it is 200 ms on the node Cassandra + Hadoop that= is running on SLES 11 =3B(Cassandra= is using 400% cpu on this node)
- it is 4200 ms on the 3 = nodes that are hadoop only
- it is 900 ms on two nodes that are C= assandra + Hadoop and running on SLES 10 (Cassandra is using 400% cpu on th= is node)
- it is 4200 ms on the last Cassandra + Hadoop node (Cas= sandra is using 2300% cpu on this node and I get a lot of Garbage collectio= n messages in the cassandra logs of this node only)

When I run only 1 concurrent map task per node (instead of 6 above)=2C I = get the following results:
- it is 200 ms on the node Cassan= dra + Hadoop that is running on SLES 11 =3B(Cassandra is using 150% cpu on this node)
- it is = 600 ms on the 3 nodes that are hadoop only
- it is 600 ms on two = nodes that are Cassandra + Hadoop and running on SLES 10 (Cassandra is usin= g 150% cpu on this node)
- it is 600 ms on the last Cassandra + H= adoop node (Cassandra is using 400% cpu on this node and I don't get Garbag= e collection messages anymore in the cassandra logs)

I do not really know what is happening during this gap=3B my guess= would be that Hadoop is reading data in Cassandra=2C streaming it to the H= adoop nodes and finally writing it to the Hadoop Distributed File System.
Does anyone understand how reads are done when using Hadoop and Ca= ssandra? and what is exactly happening during this gap in time? and why the= re is such a difference in time between nodes running on SLES10 and the nod= e running on SLES 11?
Why does it seem like this gap in time is s= maller on nodes running Cassandra + Hadoop?

Finall= y=2C does anyone know why this gap in time occurs after approximately 4160 = rows which represent about 32 KB in my case? Is there any parameter I am no= t aware of to change this?

Thanks in advance=2C
Ralph
= --_e406450e-0413-4bee-b5fc-4e8d2aaa86b4_--