crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stefan De Smit (JIRA)" <j...@apache.org>
Subject [jira] [Created] (CRUNCH-586) SparkPipeline does not work with HBaseSourceTarget
Date Wed, 13 Jan 2016 12:47:39 GMT
Stefan De Smit created CRUNCH-586:
-------------------------------------

             Summary: SparkPipeline does not work with HBaseSourceTarget
                 Key: CRUNCH-586
                 URL: https://issues.apache.org/jira/browse/CRUNCH-586
             Project: Crunch
          Issue Type: Bug
          Components: Spark
    Affects Versions: 0.13.0
            Reporter: Stefan De Smit


final Pipeline pipeline = new SparkPipeline("local", "crunchhbase", HBaseInputSource.class,
conf);
        final PTable<ImmutableBytesWritable, Result> read = pipeline.read(new HBaseSourceTarget("t1",
new Scan()));

return an empty table, while it works with MRPipeline.
root cause is the combination of sparks getJavaRDDLike method:

source.configureSource(job, -1);
      Converter converter = source.getConverter();
      JavaPairRDD<?, ?> input = runtime.getSparkContext().newAPIHadoopRDD(
          job.getConfiguration(),
          CrunchInputFormat.class,
          converter.getKeyClass(),
          converter.getValueClass());
That assumes "CrunchInputFormat.class" (and always uses -1)
and hbase configureSoruce method:

if (inputId == -1) {
      job.setMapperClass(CrunchMapper.class);
      job.setInputFormatClass(inputBundle.getFormatClass());
      inputBundle.configure(conf);
    } else {
      Path dummy = new Path("/hbase/" + table);
      CrunchInputs.addInputPath(job, dummy, inputBundle, inputId);
    }

easiest solution I see, is always calling CrunchInputs.addInputPath, in every source. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message