crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Wills <jwi...@cloudera.com>
Subject Re: HFileOutputFormatForCrunch with spark pipeline
Date Thu, 13 Aug 2015 05:17:21 GMT
Tracking here: https://issues.apache.org/jira/browse/CRUNCH-556

On Wed, Aug 12, 2015 at 8:10 PM, Josh Wills <jwills@cloudera.com> wrote:

> Hey Surbhi,
>
> I think it's just a bug-- Crunch-on-Spark should be handling the
> partitioner stuff correctly w/o requiring you to write your own. I think
> the problem is that we set the location of the partition file (the one that
> the code is mad that it can't find in your gist) inside of the
> GroupingOptions class, and we're not updating the Configuration object that
> the Spark job is going to use w/the location of that file in the same way
> we do on MapReduce. I'll file a bug for it and see if I can't come up w/a
> fix and unit test tomorrow.
>
> Thanks!
> Josh
>
> On Wed, Aug 12, 2015 at 10:45 AM, Surbhi Mungre <mungre.surbhi@gmail.com>
> wrote:
>
>> I am converting a MRPipeline to SparkPipeline with these[1] instructions.
>> My SparkPipeline fails with this[2] exception. In my pipeline I am trying
>> to write to HBase using HFiles. IIUC M/R job which creates HFiles uses a
>> custom partitioner. I am not sure how Crunch translates this to Spark. From
>> the exception stack trace it looks like Spark is using M/R partitioner. I
>> am completely new to Spark but I think I will have to create a custom spark
>> partitioner and use it instead. When I am converting a MRPipeline to
>> SparkPipeline, if a M/R job uses custom partitioner will Crunch handle it?
>>
>>
>> [1]
>> http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/cdh_ig_running_crunch_with_spark.html
>>
>> [2] https://gist.github.com/anonymous/920c000f20229eaa76d8
>>
>> Thanks,
>> Surbhi
>>
>>
>
>
> --
> Director of Data Science
> Cloudera <http://www.cloudera.com>
> Twitter: @josh_wills <http://twitter.com/josh_wills>
>



-- 
Director of Data Science
Cloudera <http://www.cloudera.com>
Twitter: @josh_wills <http://twitter.com/josh_wills>

Mime
View raw message