crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Surbhi Mungre <mungre.sur...@gmail.com>
Subject HFileOutputFormatForCrunch with spark pipeline
Date Wed, 12 Aug 2015 17:45:45 GMT
I am converting a MRPipeline to SparkPipeline with these[1] instructions.
My SparkPipeline fails with this[2] exception. In my pipeline I am trying
to write to HBase using HFiles. IIUC M/R job which creates HFiles uses a
custom partitioner. I am not sure how Crunch translates this to Spark. From
the exception stack trace it looks like Spark is using M/R partitioner. I
am completely new to Spark but I think I will have to create a custom spark
partitioner and use it instead. When I am converting a MRPipeline to
SparkPipeline, if a M/R job uses custom partitioner will Crunch handle it?


[1]
http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/cdh_ig_running_crunch_with_spark.html

[2] https://gist.github.com/anonymous/920c000f20229eaa76d8

Thanks,
Surbhi

Mime
View raw message