cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jian Fang <jian.fang.subscr...@gmail.com>
Subject java.io.IOException: Could not get input splits
Date Thu, 01 Sep 2011 15:54:45 GMT
Hi,

I upgraded Cassandra from 0.8.2 to 0.8.4 and run a hadoop job to read data
from Cassandra, but
got the following errors:

11/09/01 11:42:46 INFO hadoop.SalesRankLoader: Start Cassandra reader...
Exception in thread "main" java.io.IOException: Could not get input splits
    at
org.apache.cassandra.hadoop.ColumnFamilyInputFormat.getSplits(ColumnFamilyInputFormat.java:157)
    at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:885)
    at
org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:779)
    at org.apache.hadoop.mapreduce.Job.submit(Job.java:432)
    at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:447)
    at
com.barnesandnoble.hadoop.SalesRankLoader.run(SalesRankLoader.java:359)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at
com.barnesandnoble.hadoop.SalesRankLoader.main(SalesRankLoader.java:408)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
Caused by: java.util.concurrent.ExecutionException:
java.lang.IllegalArgumentException: protocol = socket host = null
    at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
    at java.util.concurrent.FutureTask.get(FutureTask.java:83)
    at
org.apache.cassandra.hadoop.ColumnFamilyInputFormat.getSplits(ColumnFamilyInputFormat.java:153)
    ... 12 more
Caused by: java.lang.IllegalArgumentException: protocol = socket host = null
    at
sun.net.spi.DefaultProxySelector.select(DefaultProxySelector.java:151)
    at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:358)
    at java.net.Socket.connect(Socket.java:529)
    at org.apache.thrift.transport.TSocket.open(TSocket.java:178)
    at
org.apache.thrift.transport.TFramedTransport.open(TFramedTransport.java:81)
    at
org.apache.cassandra.hadoop.ColumnFamilyInputFormat.createConnection(ColumnFamilyInputFormat.java:243)
    at
org.apache.cassandra.hadoop.ColumnFamilyInputFormat.getSubSplits(ColumnFamilyInputFormat.java:217)
    at
org.apache.cassandra.hadoop.ColumnFamilyInputFormat.access$200(ColumnFamilyInputFormat.java:70)
    at
org.apache.cassandra.hadoop.ColumnFamilyInputFormat$SplitCallable.call(ColumnFamilyInputFormat.java:190)
    at
org.apache.cassandra.hadoop.ColumnFamilyInputFormat$SplitCallable.call(ColumnFamilyInputFormat.java:175)
    at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
    at java.util.concurrent.FutureTask.run(FutureTask.java:138)
    at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
    at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
    at java.lang.Thread.run(Thread.java:662)

The code used to work for 0.8.2 and it is really strange to see the host =
null. My code is very similar to the word count example,

        logger.info("Start Cassandra reader...");
        Job job2 = new Job(getConf(), "SalesRankCassandraReader");
        job2.setJarByClass(SalesRankLoader.class);
        job2.setMapperClass(CassandraReaderMapper.class);
        job2.setReducerClass(CassandraToFilesystem.class);
        job2.setOutputKeyClass(Text.class);
        job2.setOutputValueClass(IntWritable.class);
        job2.setMapOutputKeyClass(Text.class);
        job2.setMapOutputValueClass(IntWritable.class);
        FileOutputFormat.setOutputPath(job2, new Path(outPath));

        job2.setInputFormatClass(ColumnFamilyInputFormat.class);

        ConfigHelper.setRpcPort(job2.getConfiguration(), "9260");
        ConfigHelper.setInitialAddress(job2.getConfiguration(),
"dnjsrcha02");
        ConfigHelper.setPartitioner(job2.getConfiguration(),
"org.apache.cassandra.dht.RandomPartitioner");
        ConfigHelper.setInputColumnFamily(job2.getConfiguration(), KEYSPACE,
columnFamily);
//        ConfigHelper.setInputSplitSize(job2.getConfiguration(), 5000);
        ConfigHelper.setRangeBatchSize(job2.getConfiguration(), batchSize);
        SlicePredicate predicate = new
SlicePredicate().setColumn_names(Arrays.asList(ByteBufferUtil.bytes(columnName)));
        ConfigHelper.setInputSlicePredicate(job2.getConfiguration(),
predicate);

        job2.waitForCompletion(true);

The Cassandra cluster includes 6 nodes and I am pretty sure they work fine.

Please help.

Thanks,

John

Mime
View raw message