accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Donald Miner <dmi...@clearedgeit.com>
Subject Re: Accumulo / HBase migration
Date Tue, 09 Jul 2013 17:49:11 GMT
Keith,

I think TableInputFormat [1] does what we want for this in HBase. It sets
up a mapper per region and slurps it out in sorted order. However, I don't
know of an offline mapreduce for hbase and googling didn't turn anything up.

This was my already preferred approach (and thanks to Christopher and you
for more validation). I like how if you make the regions and tablets match
up 1:1, you don't need to do any partitioning work to create the sorted
rfiles. I'll probably go forward with this.

Thanks!

-Don


[1]
http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/TableInputFormat.html


On Tue, Jul 9, 2013 at 1:39 PM, Keith Turner <keith@deenlo.com> wrote:

> I suspect taking the route of translating individual files is tricky.   If
> one hfile has a delete entry that suppresses keys in another hfile, then
> this needs to be handled properly.  If you read through the HBase API, then
> it will be handled.   Assuming that you can read sorted ranges per mapper,
> it seems like the following could be done.
>
> HBase -> map-only mapreduce that creates an rfile per mapper -> Accumulo
> bulk load -> Accumulo
>
> If HBase has something like Accumulo's offline map reduce, then this could
> be done more efficiently while still using the full HBase API to read data.
>
> Keith
>
> On Tue, Jul 9, 2013 at 1:26 PM, Donald Miner <dminer@clearedgeit.com>wrote:
>
>> Has anyone developed tools to migrate data from an existing HBase
>> implementation to Accumulo? My team has done it "manually" in the past but
>> it seems like it would be reasonable to write a process that handled the
>> steps in a more automated fashion.
>>
>> Here are a few sample designs I've kicked around:
>>
>> HBase -> mapreduce -> mappers bulk write to accumulo -> Accumulo
>> or
>> HBase -> mapreduce -> tfiles via AccumuloFileOutputFormat -> Accumulo
>> bulk load -> Accumulo
>> or
>> HBase -> bulk export -> map-only mapreduce to translate hfiles into
>> tfiles (how hard would this be??) -> Accumulo bulk load -> Accumulo
>>
>> I guess this could be extended to go the other way around (and also
>> include Cassandra perhaps).
>>
>> Maybe we'll start working on this soon. I just wanted to kick the idea
>> out there to see if it's been done before or if anyone has some gut
>> reactions to the process.
>>
>> -Don
>>
>> This communication is the property of ClearEdge IT Solutions, LLC and may
>> contain confidential and/or privileged information. Any review,
>> retransmissions, dissemination or other use of or taking of any action in
>> reliance upon this information by persons or entities other than the
>> intended recipient is prohibited. If you receive this communication in
>> error, please immediately notify the sender and destroy all copies of the
>> communication and any attachments.
>>
>
>


-- 
*
*Donald Miner
Chief Technology Officer
ClearEdge IT Solutions, LLC
Cell: 443 799 7807
www.clearedgeit.com

-- 
 This communication is the property of ClearEdge IT Solutions, LLC and may 
contain confidential and/or privileged information. Any review, 
retransmissions, dissemination or other use of or taking of any action in 
reliance upon this information by persons or entities other than the 
intended recipient is prohibited. If you receive this communication in 
error, please immediately notify the sender and destroy all copies of the 
communication and any attachments.

Mime
View raw message