Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A4B1836BB for ; Mon, 2 May 2011 11:26:02 +0000 (UTC) Received: (qmail 28488 invoked by uid 500); 2 May 2011 11:26:00 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 28431 invoked by uid 500); 2 May 2011 11:26:00 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 28423 invoked by uid 99); 2 May 2011 11:26:00 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 02 May 2011 11:26:00 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [74.125.82.172] (HELO mail-wy0-f172.google.com) (74.125.82.172) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 02 May 2011 11:25:52 +0000 Received: by wyb29 with SMTP id 29so4776537wyb.31 for ; Mon, 02 May 2011 04:25:30 -0700 (PDT) Received: by 10.227.200.198 with SMTP id ex6mr2461007wbb.2.1304335529908; Mon, 02 May 2011 04:25:29 -0700 (PDT) Received: from [192.168.0.102] (dslb-094-217-060-043.pools.arcor-ip.net [94.217.60.43]) by mx.google.com with ESMTPS id w25sm3322724wbd.39.2011.05.02.04.25.29 (version=SSLv3 cipher=OTHER); Mon, 02 May 2011 04:25:29 -0700 (PDT) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Apple Message framework v1084) Subject: Re: Experiences with Map&Reduce Stress Tests From: Subscriber In-Reply-To: Date: Mon, 2 May 2011 13:25:39 +0200 Content-Transfer-Encoding: quoted-printable Message-Id: <3BF1B6B0-D011-4791-A087-53224C7148D4@zfabrik.de> References: To: user@cassandra.apache.org X-Mailer: Apple Mail (2.1084) Hi Jeremy,=20 thanks for the link. I doubled the rpc_timeout (20 seconds) and reduced the range-batch-size = to 2048, but I still get timeouts... Udo Am 29.04.2011 um 18:53 schrieb Jeremy Hanna: > It sounds like there might be some tuning you can do to your jobs - = take a look at the wiki's HadoopSupport page, specifically the = Troubleshooting section: > http://wiki.apache.org/cassandra/HadoopSupport#Troubleshooting >=20 > On Apr 29, 2011, at 11:45 AM, Subscriber wrote: >=20 >> Hi all,=20 >>=20 >> We want to share our experiences we got during our Cassandra plus = Hadoop Map/Reduce evaluation. >> Our question was whether Cassandra is suitable for massive = distributed data writes using Hadoop's Map/Reduce feature. >>=20 >> Our setup is described in the attached file = 'cassandra_stress_setup.txt'. >>=20 >> >>=20 >> The stress test uses 800 map-tasks to generate data and store it into = cassandra. >> Each map task writes 500.000 items (i.e. rows) resulting in totally = 400.000.000 items.=20 >> There are max. 8 map tasks in parallel on each node. An item contains = (beside the key) two long and two double values,=20 >> so that items are a few 100 bytes in size. This leads to a total data = size of approximately 120GB. >>=20 >> The Map-Tasks uses the Hector API. Hector is "feeded" with all three = data nodes. The data is written in chunks of 1000 items. >> The ConsitencyLevel is set to ONE. >>=20 >> We ran the stress tests in several runs with different configuration = settings (for example I started with cassandra's default configuration = and I used Pelops for another test). >>=20 >> Our observations are like this: >>=20 >> 1) Cassandra is really fast - we are really impressed about the huge = write throughput. A map task writing 500.000 items (appr. 200MB) usually = finishes under 5 minutes. >> 2) However - unfortunately all tests failed in the end >>=20 >> In the beginning there are no problems. The first 100 (in some tests = the first 300(!)) map tasks are looking fine. But then the trouble = starts. >>=20 >> Hadoop's sample output after ~15 minutes: >>=20 >> Kind % Complete Num Tasks Pending Running Complete = Killed Failed/Killed Task Attempts >> map 14.99% 800 680 24 96 = 0 0 / 0 >> reduce 3.99% 1 0 1 0 = 0 0 / 0 >>=20 >> Some stats: >>> top >> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND = = = =20 >> 31159 xxxx 20 0 2569m 2.2g 9.8m S 450 18.6 61:44.73 java = = = =20 >>=20 >>> vmstat 1 5 >> procs -----------memory---------- ---swap-- -----io---- -system-- = ----cpu---- >> r b swpd free buff cache si so bi bo in cs us = sy id wa >> 2 1 36832 353688 242820 6837520 0 0 15 73 3 2 3 = 0 96 0 >> 11 1 36832 350992 242856 6852136 0 0 1024 20900 4508 11738 = 19 1 74 6 >> 8 0 36832 339728 242876 6859828 0 0 0 1068 45809 107008 = 69 10 20 0 >> 1 0 36832 330212 242884 6868520 0 0 0 80 42112 92930 = 71 8 21 0 >> 2 0 36832 311888 242908 6887708 0 0 1024 0 20277 46669 = 46 7 47 0 >>=20 >>> cassandra/bin/nodetool -h tirdata1 -p 28080 ring >> Address Status State Load Owns Token = =20 >> = 113427455640312821154458202477256070484 =20 >> 192.168.11.198 Up Normal 6.72 GB 33.33% 0 = =20 >> 192.168.11.199 Up Normal 6.72 GB 33.33% = 56713727820156410577229101238628035242 =20 >> 192.168.11.202 Up Normal 6.68 GB 33.33% = 113427455640312821154458202477256070484 =20 >>=20 >>=20 >> Hadoop's sample output after ~20 minutes: >>=20 >> Kind % Complete Num Tasks Pending Running Complete = Killed Failed/Killed Task Attempts >> map 15.49% 800 673 24 103 = 0 6 / 0 >> reduce 4.16% 1 0 1 0 = 0 0 / 0 >>=20 >> What went wrong? It's always the same. The clients cannot reach the = nodes anymore.=20 >>=20 >> java.lang.RuntimeException: work failed >> at = com.zfabrik.hadoop.impl.HadoopProcessRunner.work(HadoopProcessRunner.java:= 109) >> at = com.zfabrik.hadoop.impl.DelegatingMapper.run(DelegatingMapper.java:40) >> at = org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:625) >> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) >> at org.apache.hadoop.mapred.Child.main(Child.java:170) >> Caused by: java.lang.reflect.InvocationTargetException >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> at = sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:= 39) >> at = sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorIm= pl.java:25) >> at java.lang.reflect.Method.invoke(Method.java:597) >> at = com.zfabrik.hadoop.impl.HadoopProcessRunner.work(HadoopProcessRunner.java:= 107) >> ... 4 more >> Caused by: java.lang.RuntimeException: = me.prettyprint.hector.api.exceptions.HUnavailableException: : May not be = enough replicas present to handle consistency level. >> at = com.zfabrik.hadoop.impl.DelegatingMapper$1.run(DelegatingMapper.java:47) >> at com.zfabrik.work.WorkUnit.work(WorkUnit.java:342) >> at = com.zfabrik.impl.launch.ProcessRunnerImpl.work(ProcessRunnerImpl.java:189)= >> ... 9 more >> Caused by: = me.prettyprint.hector.api.exceptions.HUnavailableException: : May not be = enough replicas present to handle consistency level. >> at = me.prettyprint.cassandra.service.ExceptionsTranslatorImpl.translate(Except= ionsTranslatorImpl.java:52) >> at = me.prettyprint.cassandra.service.KeyspaceServiceImpl$1.execute(KeyspaceSer= viceImpl.java:95) >> at = me.prettyprint.cassandra.service.KeyspaceServiceImpl$1.execute(KeyspaceSer= viceImpl.java:88) >> at = me.prettyprint.cassandra.service.Operation.executeAndSetResult(Operation.j= ava:101) >> at = me.prettyprint.cassandra.connection.HConnectionManager.operateWithFailover= (HConnectionManager.java:221) >> at = me.prettyprint.cassandra.service.KeyspaceServiceImpl.operateWithFailover(K= eyspaceServiceImpl.java:129) >> at = me.prettyprint.cassandra.service.KeyspaceServiceImpl.batchMutate(KeyspaceS= erviceImpl.java:100) >> at = me.prettyprint.cassandra.service.KeyspaceServiceImpl.batchMutate(KeyspaceS= erviceImpl.java:106) >> at = me.prettyprint.cassandra.model.MutatorImpl$2.doInKeyspace(MutatorImpl.java= :203) >> at = me.prettyprint.cassandra.model.MutatorImpl$2.doInKeyspace(MutatorImpl.java= :200) >> at = me.prettyprint.cassandra.model.KeyspaceOperationCallback.doInKeyspaceAndMe= asure(KeyspaceOperationCallback.java:20) >> at = me.prettyprint.cassandra.model.ExecutingKeyspace.doExecute(ExecutingKeyspa= ce.java:85) >> at = me.prettyprint.cassandra.model.MutatorImpl.execute(MutatorImpl.java:200) >> at = sample.cassandra.itemrepo.mapreduce.HectorBasedMassItemGenMapper._flush(He= ctorBasedMassItemGenMapper.java:122) >> at = sample.cassandra.itemrepo.mapreduce.HectorBasedMassItemGenMapper.map(Hecto= rBasedMassItemGenMapper.java:103) >> at = sample.cassandra.itemrepo.mapreduce.HectorBasedMassItemGenMapper.map(Hecto= rBasedMassItemGenMapper.java:1) >> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) >> at = com.zfabrik.hadoop.impl.DelegatingMapper$1.run(DelegatingMapper.java:45) >> ... 11 more >> Caused by: UnavailableException() >> at = org.apache.cassandra.thrift.Cassandra$batch_mutate_result.read(Cassandra.j= ava:16485) >> at = org.apache.cassandra.thrift.Cassandra$Client.recv_batch_mutate(Cassandra.j= ava:916) >> at = org.apache.cassandra.thrift.Cassandra$Client.batch_mutate(Cassandra.java:8= 90) >> at = me.prettyprint.cassandra.service.KeyspaceServiceImpl$1.execute(KeyspaceSer= viceImpl.java:93) >> ... 27 more >>=20 >> ------- >> Task attempt_201104291345_0001_m_000028_0 failed to report status for = 602 seconds. Killing! >>=20 >>=20 >> I also observed also that when connecting to cassandra-cli during the = stress test, it was not possible to list the items written so far: >>=20 >> [default@unknown] use ItemRepo; >> Authenticated to keyspace: ItemRepo >> [default@ItemRepo] list Items; >> Using default limit of 100 >> Internal error processing get_range_slices >>=20 >>=20 >> It seems to me that from the point I performed the read operation in = the cli tool, the node becomes somehow confused. >> Looking on the jconsole shows that up to this point the heap is well: = it grows and gcs clears it again.=20 >> But from this point on, gcs doesn't really help anymore (see attached = screenshot). >>=20 >>=20 >> >>=20 >>=20 >> This has also impact on the other nodes as one can see in the second = screenshot. The CPU Usage goes down as well as the heap memory usage. =20= >> >>=20 >> I'll run another stress test at the weekend with >>=20 >> * MAX_HEAP_SIZE=3D"4G" >> * HEAP_NEWSIZE=3D"400M" >>=20 >> Best Regards >> Udo >>=20 >=20