cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeremy Hanna <jeremy.hanna1...@gmail.com>
Subject Re: Experiences with Map&Reduce Stress Tests
Date Fri, 29 Apr 2011 16:53:33 GMT
It sounds like there might be some tuning you can do to your jobs - take a look at the wiki's
HadoopSupport page, specifically the Troubleshooting section:
http://wiki.apache.org/cassandra/HadoopSupport#Troubleshooting

On Apr 29, 2011, at 11:45 AM, Subscriber wrote:

> Hi all, 
> 
> We want to share our experiences we got during our Cassandra plus Hadoop Map/Reduce evaluation.
> Our question was whether Cassandra is suitable for massive distributed data writes using
Hadoop's Map/Reduce feature.
> 
> Our setup is described in the attached file 'cassandra_stress_setup.txt'.
> 
> <cassandra_stress_setup.txt>
> 
> The stress test uses 800 map-tasks to generate data and store it into cassandra.
> Each map task writes 500.000 items (i.e. rows) resulting in totally 400.000.000 items.

> There are max. 8 map tasks in parallel on each node. An item contains (beside the key)
two long and two double values, 
> so that items are a few 100 bytes in size. This leads to a total data size of approximately
120GB.
> 
> The Map-Tasks uses the Hector API. Hector is "feeded" with all three data nodes. The
data is written in chunks of 1000 items.
> The ConsitencyLevel is set to ONE.
> 
> We ran the stress tests in several runs with different configuration settings (for example
I started with cassandra's default configuration and I used Pelops for another test).
> 
> Our observations are like this:
> 
> 1) Cassandra is really fast - we are really impressed about the huge write throughput.
A map task writing 500.000 items (appr. 200MB) usually finishes under 5 minutes.
> 2) However - unfortunately all tests failed in the end
> 
> In the beginning there are no problems. The first 100 (in some tests the first 300(!))
map tasks are looking fine. But then the trouble starts.
> 
> Hadoop's sample output after ~15 minutes:
> 
> Kind	% Complete	Num Tasks	Pending	Running	Complete	Killed	Failed/Killed Task Attempts
> map	14.99%		800		680	24	96		0	0 / 0
> reduce	3.99%		1		0	1	0		0	0 / 0
> 
> Some stats:
> >top
>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                
                                                                                         
                                                                                   
> 31159 xxxx      20   0 2569m 2.2g 9.8m S  450 18.6  61:44.73 java                   
                                                                                         
                                                                                     
> 
> >vmstat 1 5
> procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
>  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
>  2  1  36832 353688 242820 6837520    0    0    15    73    3    2  3  0 96  0
> 11  1  36832 350992 242856 6852136    0    0  1024 20900 4508 11738 19  1 74  6
>  8  0  36832 339728 242876 6859828    0    0     0  1068 45809 107008 69 10 20  0
>  1  0  36832 330212 242884 6868520    0    0     0    80 42112 92930 71  8 21  0
>  2  0  36832 311888 242908 6887708    0    0  1024     0 20277 46669 46  7 47  0
> 
> >cassandra/bin/nodetool -h tirdata1 -p 28080 ring
> Address         Status State   Load            Owns    Token                        
             
>                                                        113427455640312821154458202477256070484
    
> 192.168.11.198  Up     Normal  6.72 GB         33.33%  0                            
             
> 192.168.11.199  Up     Normal  6.72 GB         33.33%  56713727820156410577229101238628035242
     
> 192.168.11.202  Up     Normal  6.68 GB         33.33%  113427455640312821154458202477256070484
    
> 
> 
> Hadoop's sample output after ~20 minutes:
> 
> Kind	% Complete	Num Tasks	Pending	Running	Complete	Killed	Failed/Killed Task Attempts
> map	15.49%		800		673	24	103		0	6 / 0
> reduce	4.16%		1		0	1	0		0	0 / 0
> 
> What went wrong? It's always the same. The clients cannot reach the nodes anymore. 
> 
> java.lang.RuntimeException: work failed
> 	at com.zfabrik.hadoop.impl.HadoopProcessRunner.work(HadoopProcessRunner.java:109)
> 	at com.zfabrik.hadoop.impl.DelegatingMapper.run(DelegatingMapper.java:40)
> 	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:625)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> 	at org.apache.hadoop.mapred.Child.main(Child.java:170)
> Caused by: java.lang.reflect.InvocationTargetException
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at com.zfabrik.hadoop.impl.HadoopProcessRunner.work(HadoopProcessRunner.java:107)
> 	... 4 more
> Caused by: java.lang.RuntimeException: me.prettyprint.hector.api.exceptions.HUnavailableException:
: May not be enough replicas present to handle consistency level.
> 	at com.zfabrik.hadoop.impl.DelegatingMapper$1.run(DelegatingMapper.java:47)
> 	at com.zfabrik.work.WorkUnit.work(WorkUnit.java:342)
> 	at com.zfabrik.impl.launch.ProcessRunnerImpl.work(ProcessRunnerImpl.java:189)
> 	... 9 more
> Caused by: me.prettyprint.hector.api.exceptions.HUnavailableException: : May not be enough
replicas present to handle consistency level.
> 	at me.prettyprint.cassandra.service.ExceptionsTranslatorImpl.translate(ExceptionsTranslatorImpl.java:52)
> 	at me.prettyprint.cassandra.service.KeyspaceServiceImpl$1.execute(KeyspaceServiceImpl.java:95)
> 	at me.prettyprint.cassandra.service.KeyspaceServiceImpl$1.execute(KeyspaceServiceImpl.java:88)
> 	at me.prettyprint.cassandra.service.Operation.executeAndSetResult(Operation.java:101)
> 	at me.prettyprint.cassandra.connection.HConnectionManager.operateWithFailover(HConnectionManager.java:221)
> 	at me.prettyprint.cassandra.service.KeyspaceServiceImpl.operateWithFailover(KeyspaceServiceImpl.java:129)
> 	at me.prettyprint.cassandra.service.KeyspaceServiceImpl.batchMutate(KeyspaceServiceImpl.java:100)
> 	at me.prettyprint.cassandra.service.KeyspaceServiceImpl.batchMutate(KeyspaceServiceImpl.java:106)
> 	at me.prettyprint.cassandra.model.MutatorImpl$2.doInKeyspace(MutatorImpl.java:203)
> 	at me.prettyprint.cassandra.model.MutatorImpl$2.doInKeyspace(MutatorImpl.java:200)
> 	at me.prettyprint.cassandra.model.KeyspaceOperationCallback.doInKeyspaceAndMeasure(KeyspaceOperationCallback.java:20)
> 	at me.prettyprint.cassandra.model.ExecutingKeyspace.doExecute(ExecutingKeyspace.java:85)
> 	at me.prettyprint.cassandra.model.MutatorImpl.execute(MutatorImpl.java:200)
> 	at sample.cassandra.itemrepo.mapreduce.HectorBasedMassItemGenMapper._flush(HectorBasedMassItemGenMapper.java:122)
> 	at sample.cassandra.itemrepo.mapreduce.HectorBasedMassItemGenMapper.map(HectorBasedMassItemGenMapper.java:103)
> 	at sample.cassandra.itemrepo.mapreduce.HectorBasedMassItemGenMapper.map(HectorBasedMassItemGenMapper.java:1)
> 	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> 	at com.zfabrik.hadoop.impl.DelegatingMapper$1.run(DelegatingMapper.java:45)
> 	... 11 more
> Caused by: UnavailableException()
> 	at org.apache.cassandra.thrift.Cassandra$batch_mutate_result.read(Cassandra.java:16485)
> 	at org.apache.cassandra.thrift.Cassandra$Client.recv_batch_mutate(Cassandra.java:916)
> 	at org.apache.cassandra.thrift.Cassandra$Client.batch_mutate(Cassandra.java:890)
> 	at me.prettyprint.cassandra.service.KeyspaceServiceImpl$1.execute(KeyspaceServiceImpl.java:93)
> 	... 27 more
> 
> -------
> Task attempt_201104291345_0001_m_000028_0 failed to report status for 602 seconds. Killing!
> 
> 
> I also observed also that when connecting to cassandra-cli during the stress test, it
was not possible to list the items written so far:
> 
> [default@unknown] use ItemRepo;
> Authenticated to keyspace: ItemRepo
> [default@ItemRepo] list Items;
> Using default limit of 100
> Internal error processing get_range_slices
> 
> 
> It seems to me that from the point I performed the read operation in the cli tool, the
node becomes somehow confused.
> Looking on the jconsole shows that up to this point the heap is well: it grows and gcs
clears it again. 
> But from this point on, gcs doesn't really help anymore (see attached screenshot).
> 
> 
> <Bildschirmfoto 2011-04-29 um 18.30.14.png>
> 
> 
> This has also impact on the other nodes as one can see in the second screenshot. The
CPU Usage goes down as well as the heap memory usage.  
> <Bildschirmfoto 2011-04-29 um 18.36.34.png>
> 
> I'll run another stress test at the weekend with
> 
>  * MAX_HEAP_SIZE="4G"
>  * HEAP_NEWSIZE="400M"
> 
> Best Regards
> Udo
> 


Mime
View raw message