Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: neutral (athena.apache.org: local policy)
Content-Type: text/plain; charset=us-ascii
Mime-Version: 1.0 (Apple Message framework v1084)
Subject: Re: Experiences with Map&Reduce Stress Tests
From: Subscriber <subscriber@zfabrik.de>
In-Reply-To: <EABE0A08-3912-4CFD-9508-D56CC1655D75@gmail.com>
Date: Mon, 2 May 2011 13:25:39 +0200
Content-Transfer-Encoding: quoted-printable
Message-Id: <3BF1B6B0-D011-4791-A087-53224C7148D4@zfabrik.de>
References: <C584836F-4090-4566-953E-D523BC79AF08@zfabrik.de>
 <EABE0A08-3912-4CFD-9508-D56CC1655D75@gmail.com>
To: user@cassandra.apache.org

Hi Jeremy,=20

thanks for the link.
I doubled the rpc_timeout (20 seconds) and reduced the range-batch-size =
to 2048, but I still get timeouts...

Udo

Am 29.04.2011 um 18:53 schrieb Jeremy Hanna:

> It sounds like there might be some tuning you can do to your jobs - =
take a look at the wiki's HadoopSupport page, specifically the =
Troubleshooting section:
> http://wiki.apache.org/cassandra/HadoopSupport#Troubleshooting
>=20
> On Apr 29, 2011, at 11:45 AM, Subscriber wrote:
>=20
>> Hi all,=20
>>=20
>> We want to share our experiences we got during our Cassandra plus =
Hadoop Map/Reduce evaluation.
>> Our question was whether Cassandra is suitable for massive =
distributed data writes using Hadoop's Map/Reduce feature.
>>=20
>> Our setup is described in the attached file =
'cassandra_stress_setup.txt'.
>>=20
>> <cassandra_stress_setup.txt>
>>=20
>> The stress test uses 800 map-tasks to generate data and store it into =
cassandra.
>> Each map task writes 500.000 items (i.e. rows) resulting in totally =
400.000.000 items.=20
>> There are max. 8 map tasks in parallel on each node. An item contains =
(beside the key) two long and two double values,=20
>> so that items are a few 100 bytes in size. This leads to a total data =
size of approximately 120GB.
>>=20
>> The Map-Tasks uses the Hector API. Hector is "feeded" with all three =
data nodes. The data is written in chunks of 1000 items.
>> The ConsitencyLevel is set to ONE.
>>=20
>> We ran the stress tests in several runs with different configuration =
settings (for example I started with cassandra's default configuration =
and I used Pelops for another test).
>>=20
>> Our observations are like this:
>>=20
>> 1) Cassandra is really fast - we are really impressed about the huge =
write throughput. A map task writing 500.000 items (appr. 200MB) usually =
finishes under 5 minutes.
>> 2) However - unfortunately all tests failed in the end
>>=20
>> In the beginning there are no problems. The first 100 (in some tests =
the first 300(!)) map tasks are looking fine. But then the trouble =
starts.
>>=20
>> Hadoop's sample output after ~15 minutes:
>>=20
>> Kind	% Complete	Num Tasks	Pending	Running	Complete	=
Killed	Failed/Killed Task Attempts
>> map	14.99%		800		680	24	96		=
0	0 / 0
>> reduce	3.99%		1		0	1	0		=
0	0 / 0
>>=20
>> Some stats:
>>> top
>>  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND   =
                                                                         =
                                                                         =
                                        =20
>> 31159 xxxx      20   0 2569m 2.2g 9.8m S  450 18.6  61:44.73 java     =
                                                                         =
                                                                         =
                                           =20
>>=20
>>> vmstat 1 5
>> procs -----------memory---------- ---swap-- -----io---- -system-- =
----cpu----
>> r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us =
sy id wa
>> 2  1  36832 353688 242820 6837520    0    0    15    73    3    2  3  =
0 96  0
>> 11  1  36832 350992 242856 6852136    0    0  1024 20900 4508 11738 =
19  1 74  6
>> 8  0  36832 339728 242876 6859828    0    0     0  1068 45809 107008 =
69 10 20  0
>> 1  0  36832 330212 242884 6868520    0    0     0    80 42112 92930 =
71  8 21  0
>> 2  0  36832 311888 242908 6887708    0    0  1024     0 20277 46669 =
46  7 47  0
>>=20
>>> cassandra/bin/nodetool -h tirdata1 -p 28080 ring
>> Address         Status State   Load            Owns    Token          =
                           =20
>>                                                       =
113427455640312821154458202477256070484    =20
>> 192.168.11.198  Up     Normal  6.72 GB         33.33%  0              =
                           =20
>> 192.168.11.199  Up     Normal  6.72 GB         33.33%  =
56713727820156410577229101238628035242     =20
>> 192.168.11.202  Up     Normal  6.68 GB         33.33%  =
113427455640312821154458202477256070484    =20
>>=20
>>=20
>> Hadoop's sample output after ~20 minutes:
>>=20
>> Kind	% Complete	Num Tasks	Pending	Running	Complete	=
Killed	Failed/Killed Task Attempts
>> map	15.49%		800		673	24	103		=
0	6 / 0
>> reduce	4.16%		1		0	1	0		=
0	0 / 0
>>=20
>> What went wrong? It's always the same. The clients cannot reach the =
nodes anymore.=20
>>=20
>> java.lang.RuntimeException: work failed
>> 	at =
com.zfabrik.hadoop.impl.HadoopProcessRunner.work(HadoopProcessRunner.java:=
109)
>> 	at =
com.zfabrik.hadoop.impl.DelegatingMapper.run(DelegatingMapper.java:40)
>> 	at =
org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:625)
>> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>> 	at org.apache.hadoop.mapred.Child.main(Child.java:170)
>> Caused by: java.lang.reflect.InvocationTargetException
>> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> 	at =
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:=
39)
>> 	at =
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorIm=
pl.java:25)
>> 	at java.lang.reflect.Method.invoke(Method.java:597)
>> 	at =
com.zfabrik.hadoop.impl.HadoopProcessRunner.work(HadoopProcessRunner.java:=
107)
>> 	... 4 more
>> Caused by: java.lang.RuntimeException: =
me.prettyprint.hector.api.exceptions.HUnavailableException: : May not be =
enough replicas present to handle consistency level.
>> 	at =
com.zfabrik.hadoop.impl.DelegatingMapper$1.run(DelegatingMapper.java:47)
>> 	at com.zfabrik.work.WorkUnit.work(WorkUnit.java:342)
>> 	at =
com.zfabrik.impl.launch.ProcessRunnerImpl.work(ProcessRunnerImpl.java:189)=

>> 	... 9 more
>> Caused by: =
me.prettyprint.hector.api.exceptions.HUnavailableException: : May not be =
enough replicas present to handle consistency level.
>> 	at =
me.prettyprint.cassandra.service.ExceptionsTranslatorImpl.translate(Except=
ionsTranslatorImpl.java:52)
>> 	at =
me.prettyprint.cassandra.service.KeyspaceServiceImpl$1.execute(KeyspaceSer=
viceImpl.java:95)
>> 	at =
me.prettyprint.cassandra.service.KeyspaceServiceImpl$1.execute(KeyspaceSer=
viceImpl.java:88)
>> 	at =
me.prettyprint.cassandra.service.Operation.executeAndSetResult(Operation.j=
ava:101)
>> 	at =
me.prettyprint.cassandra.connection.HConnectionManager.operateWithFailover=
(HConnectionManager.java:221)
>> 	at =
me.prettyprint.cassandra.service.KeyspaceServiceImpl.operateWithFailover(K=
eyspaceServiceImpl.java:129)
>> 	at =
me.prettyprint.cassandra.service.KeyspaceServiceImpl.batchMutate(KeyspaceS=
erviceImpl.java:100)
>> 	at =
me.prettyprint.cassandra.service.KeyspaceServiceImpl.batchMutate(KeyspaceS=
erviceImpl.java:106)
>> 	at =
me.prettyprint.cassandra.model.MutatorImpl$2.doInKeyspace(MutatorImpl.java=
:203)
>> 	at =
me.prettyprint.cassandra.model.MutatorImpl$2.doInKeyspace(MutatorImpl.java=
:200)
>> 	at =
me.prettyprint.cassandra.model.KeyspaceOperationCallback.doInKeyspaceAndMe=
asure(KeyspaceOperationCallback.java:20)
>> 	at =
me.prettyprint.cassandra.model.ExecutingKeyspace.doExecute(ExecutingKeyspa=
ce.java:85)
>> 	at =
me.prettyprint.cassandra.model.MutatorImpl.execute(MutatorImpl.java:200)
>> 	at =
sample.cassandra.itemrepo.mapreduce.HectorBasedMassItemGenMapper._flush(He=
ctorBasedMassItemGenMapper.java:122)
>> 	at =
sample.cassandra.itemrepo.mapreduce.HectorBasedMassItemGenMapper.map(Hecto=
rBasedMassItemGenMapper.java:103)
>> 	at =
sample.cassandra.itemrepo.mapreduce.HectorBasedMassItemGenMapper.map(Hecto=
rBasedMassItemGenMapper.java:1)
>> 	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
>> 	at =
com.zfabrik.hadoop.impl.DelegatingMapper$1.run(DelegatingMapper.java:45)
>> 	... 11 more
>> Caused by: UnavailableException()
>> 	at =
org.apache.cassandra.thrift.Cassandra$batch_mutate_result.read(Cassandra.j=
ava:16485)
>> 	at =
org.apache.cassandra.thrift.Cassandra$Client.recv_batch_mutate(Cassandra.j=
ava:916)
>> 	at =
org.apache.cassandra.thrift.Cassandra$Client.batch_mutate(Cassandra.java:8=
90)
>> 	at =
me.prettyprint.cassandra.service.KeyspaceServiceImpl$1.execute(KeyspaceSer=
viceImpl.java:93)
>> 	... 27 more
>>=20
>> -------
>> Task attempt_201104291345_0001_m_000028_0 failed to report status for =
602 seconds. Killing!
>>=20
>>=20
>> I also observed also that when connecting to cassandra-cli during the =
stress test, it was not possible to list the items written so far:
>>=20
>> [default@unknown] use ItemRepo;
>> Authenticated to keyspace: ItemRepo
>> [default@ItemRepo] list Items;
>> Using default limit of 100
>> Internal error processing get_range_slices
>>=20
>>=20
>> It seems to me that from the point I performed the read operation in =
the cli tool, the node becomes somehow confused.
>> Looking on the jconsole shows that up to this point the heap is well: =
it grows and gcs clears it again.=20
>> But from this point on, gcs doesn't really help anymore (see attached =
screenshot).
>>=20
>>=20
>> <Bildschirmfoto 2011-04-29 um 18.30.14.png>
>>=20
>>=20
>> This has also impact on the other nodes as one can see in the second =
screenshot. The CPU Usage goes down as well as the heap memory usage. =20=

>> <Bildschirmfoto 2011-04-29 um 18.36.34.png>
>>=20
>> I'll run another stress test at the weekend with
>>=20
>> * MAX_HEAP_SIZE=3D"4G"
>> * HEAP_NEWSIZE=3D"400M"
>>=20
>> Best Regards
>> Udo
>>=20
>=20