spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Malaska <ted.mala...@cloudera.com>
Subject Re: Generalised Spark-HBase integration
Date Tue, 28 Jul 2015 16:14:45 GMT
Yup you should be able to do that with the APIs that are going into HBase.

Let me know if you need to chat about the problem and how to implement it
with the HBase apis.

We have tried to cover any possible way to use HBase with Spark.  Let us
know if we missed anything if we did we will add it.

On Tue, Jul 28, 2015 at 12:12 PM, Michal Haris <michal.haris@visualdna.com>
wrote:

> Hi Ted, yes, cloudera blog and your code was my starting point - but I
> needed something more spark-centric rather than on hbase. Basically doing a
> lot of ad-hoc transformations with RDDs that were based on HBase tables and
> then mutating them after series of iterative (bsp-like) steps.
>
> On 28 July 2015 at 17:06, Ted Malaska <ted.malaska@cloudera.com> wrote:
>
>> Thanks Michal,
>>
>> Just to share what I'm working on in a related topic.  So a long time ago
>> I build SparkOnHBase and put it into Cloudera Labs in this link.
>> http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/
>>
>> Also recently I am working on getting this into HBase core.  It will
>> hopefully be in HBase core with in the next couple of weeks.
>>
>> https://issues.apache.org/jira/browse/HBASE-13992
>>
>> Then I'm planing on adding dataframe and bulk load support through
>>
>> https://issues.apache.org/jira/browse/HBASE-14149
>> https://issues.apache.org/jira/browse/HBASE-14150
>>
>> Also if you are interested this is running today a at least a half a
>> dozen companies with Spark Streaming.  Here is one blog post of successful
>> implementation
>>
>>
>> http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/
>>
>> Also here is an additional example blog I also put together
>>
>>
>> http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/
>>
>> Let me know if you have any questions, also let me know if you want to
>> connect to join efforts.
>>
>> Ted Malaska
>>
>> On Tue, Jul 28, 2015 at 11:59 AM, Michal Haris <
>> michal.haris@visualdna.com> wrote:
>>
>>> Hi all, last couple of months I've been working on a large graph
>>> analytics and along the way have written from scratch a HBase-Spark
>>> integration as none of the ones out there worked either in terms of scale
>>> or in the way they integrated with the RDD interface. This week I have
>>> generalised it into an (almost) spark module, which works with the latest
>>> spark and the new hbase api, so... sharing! :
>>> https://github.com/michal-harish/spark-on-hbase
>>>
>>>
>>> --
>>> Michal Haris
>>> Technical Architect
>>> direct line: +44 (0) 207 749 0229
>>> www.visualdna.com | t: +44 (0) 207 734 7033
>>> 31 Old Nichol Street
>>> London
>>> E2 7HR
>>>
>>
>>
>
>
> --
> Michal Haris
> Technical Architect
> direct line: +44 (0) 207 749 0229
> www.visualdna.com | t: +44 (0) 207 734 7033
> 31 Old Nichol Street
> London
> E2 7HR
>

Mime
View raw message