cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Patricio Echague (JIRA)" <>
Subject [jira] [Created] (CASSANDRA-6169) Too many splits causes a "OutOfMemoryError: unable to create new native thread" in AbstractColumnFamilyInputFormat
Date Wed, 09 Oct 2013 00:29:42 GMT
Patricio Echague created CASSANDRA-6169:

             Summary: Too many splits causes a "OutOfMemoryError: unable to create new native
thread" in AbstractColumnFamilyInputFormat
                 Key: CASSANDRA-6169
             Project: Cassandra
          Issue Type: Bug
         Environment: 1.2.10
vnodes (server side)
Mac OS x (client)
            Reporter: Patricio Echague
            Priority: Minor

The problem is caused by having 2300+ tokens due to vnodes.

In the client side I get this exception

Exception in thread "main" java.lang.OutOfMemoryError: unable to create new native thread
	at java.lang.Thread.start0(Native Method)
	at java.lang.Thread.start(
	at java.util.concurrent.ThreadPoolExecutor.addWorker(
	at java.util.concurrent.ThreadPoolExecutor.execute(
	at java.util.concurrent.AbstractExecutorService.submit(
	at org.apache.cassandra.hadoop.AbstractColumnFamilyInputFormat.getSplits(
	at org.apache.hadoop.mapred.JobClient.writeNewSplits(
	at org.apache.hadoop.mapred.JobClient.writeSplits(
	at org.apache.hadoop.mapred.JobClient.access$700(
	at org.apache.hadoop.mapred.JobClient$
	at org.apache.hadoop.mapred.JobClient$
	at Method)
	at org.apache.hadoop.mapred.JobClient.submitJobInternal(
	at org.apache.hadoop.mapreduce.Job.submit(
	at org.apache.hadoop.mapreduce.Job.waitForCompletion(
	at com.relateiq.hadoop.cassandra.etl.CassandraETLJob.main(

The problem seem to be in AbstractColumnFamilyInputFormat line ~180 which has an unbounded
upper limit (actually it is Integer.MAX_INT)
ExecutorService executor = Executors.newCachedThreadPool();

Followed by:
            for (TokenRange range : masterRangeNodes)
                if (jobRange == null)
                    // for each range, pick a live owner and ask it to compute bite-sized
                    splitfutures.add(executor.submit(new SplitCallable(range, conf)));

which gets called one time per token and creates one thread just as many times.

The easy fix unless there is a longer term fix I'm unaware of would be to set an upper limit
to the thread pool.

Something like this:
ExecutorService executor = new ThreadPoolExecutor(0, ConfigHelper.getMaxConcurrentSplitsResolution(),
60L, TimeUnit.SECONDS, new SynchronousQueue<Runnable>());

Shall I proceed with a patch ?

This message was sent by Atlassian JIRA

View raw message