crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Josh Wills (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CRUNCH-586) SparkPipeline does not work with HBaseSourceTarget
Date Tue, 23 Feb 2016 17:09:18 GMT

    [ https://issues.apache.org/jira/browse/CRUNCH-586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15159198#comment-15159198
] 

Josh Wills commented on CRUNCH-586:
-----------------------------------

I was thinking the write-side seemed hard and I was hoping someone else would solve it for
me? ;-)

> SparkPipeline does not work with HBaseSourceTarget
> --------------------------------------------------
>
>                 Key: CRUNCH-586
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-586
>             Project: Crunch
>          Issue Type: Bug
>          Components: Spark
>    Affects Versions: 0.13.0
>            Reporter: Stefan De Smit
>            Assignee: Josh Wills
>         Attachments: CRUNCH-586.patch
>
>
> final Pipeline pipeline = new SparkPipeline("local", "crunchhbase", HBaseInputSource.class,
conf);
>         final PTable<ImmutableBytesWritable, Result> read = pipeline.read(new HBaseSourceTarget("t1",
new Scan()));
> return an empty table, while it works with MRPipeline.
> root cause is the combination of sparks getJavaRDDLike method:
> source.configureSource(job, -1);
>       Converter converter = source.getConverter();
>       JavaPairRDD<?, ?> input = runtime.getSparkContext().newAPIHadoopRDD(
>           job.getConfiguration(),
>           CrunchInputFormat.class,
>           converter.getKeyClass(),
>           converter.getValueClass());
> That assumes "CrunchInputFormat.class" (and always uses -1)
> and hbase configureSoruce method:
> if (inputId == -1) {
>       job.setMapperClass(CrunchMapper.class);
>       job.setInputFormatClass(inputBundle.getFormatClass());
>       inputBundle.configure(conf);
>     } else {
>       Path dummy = new Path("/hbase/" + table);
>       CrunchInputs.addInputPath(job, dummy, inputBundle, inputId);
>     }
> easiest solution I see, is always calling CrunchInputs.addInputPath, in every source.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message