Return-Path: X-Original-To: apmail-accumulo-user-archive@www.apache.org Delivered-To: apmail-accumulo-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 3A988106F2 for ; Tue, 9 Jul 2013 17:36:48 +0000 (UTC) Received: (qmail 77360 invoked by uid 500); 9 Jul 2013 17:36:48 -0000 Delivered-To: apmail-accumulo-user-archive@accumulo.apache.org Received: (qmail 77315 invoked by uid 500); 9 Jul 2013 17:36:47 -0000 Mailing-List: contact user-help@accumulo.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@accumulo.apache.org Delivered-To: mailing list user@accumulo.apache.org Received: (qmail 77307 invoked by uid 99); 9 Jul 2013 17:36:47 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 09 Jul 2013 17:36:47 +0000 X-ASF-Spam-Status: No, hits=2.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_SOFTFAIL X-Spam-Check-By: apache.org Received-SPF: softfail (nike.apache.org: transitioning domain of dminer@clearedgeit.com does not designate 209.85.219.42 as permitted sender) Received: from [209.85.219.42] (HELO mail-oa0-f42.google.com) (209.85.219.42) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 09 Jul 2013 17:36:42 +0000 Received: by mail-oa0-f42.google.com with SMTP id j6so8375481oag.1 for ; Tue, 09 Jul 2013 10:36:21 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:x-gm-message-state; bh=IbyVdpiOtx3/bWqyn5MS+D42N7ktLUEDDio+dgFJf/0=; b=ldjiOVmby+PoLbiUEsvRDvxT1qlRVy24n4Bo6VOLxApRdWkFIIvvrz1kFAQM5g5GD4 Dt+EjdxXQqhhPv5rdK3DwRlzpa7yfnZegjU6FX6Rx7X3r/CW4NOhcv06qTfX16t86UUg viDcBagA35tNNSVr4cRObRIo7gDYCYqoh0mUC2URjBbWrogWtH0pPVFBTOH0CAcUAklJ Sz6oJx+d+ONzyBGzcVw2oLgcg3nT3FLP9v6PMvENvEu94zdXTr3teIFN7GXGtLYtNUDp YPDZmOaDtocTe68SWV7GS2zMGjUTSl7nKy4G8PoExjC6vXLJx9q0G/3f0oKVmm9zPFZT 4NDQ== MIME-Version: 1.0 X-Received: by 10.60.97.1 with SMTP id dw1mr24948908oeb.1.1373391380485; Tue, 09 Jul 2013 10:36:20 -0700 (PDT) Received: by 10.76.173.104 with HTTP; Tue, 9 Jul 2013 10:36:20 -0700 (PDT) In-Reply-To: <51DC4906.8000005@hoodel.com> References: <51DC4906.8000005@hoodel.com> Date: Tue, 9 Jul 2013 13:36:20 -0400 Message-ID: Subject: Re: Accumulo / HBase migration From: Donald Miner To: user@accumulo.apache.org Content-Type: multipart/alternative; boundary=089e011774cd5f2ad304e11799d4 X-Gm-Message-State: ALoCoQlVizfjr1/d6Q/iIGlshoOPfN1J9wmVMHWxH3PEAgHTTKkSz0b7JCnX2KdmYpvArFjkNT00p76ZG15HeMg59+2q15UcJsr2sqbhhmixQHw2oBSijcY= X-Virus-Checked: Checked by ClamAV on apache.org --089e011774cd5f2ad304e11799d4 Content-Type: text/plain; charset=US-ASCII I did think about this. My naive answer is just by default ignore visibilities (meaning make everything public or make everything the same visibility). It would be interesting however to be able to insert a chunk of code that inferred the visibility from the record itself. That is, you'd have a function you can pass in that returns a ColumnVisibility and takes in a value/rowkey/etc. On Tue, Jul 9, 2013 at 1:31 PM, Kurt Christensen wrote: > > I don't have a response to your question, but it seems to me that the big > capability difference is visibility field. When doing bulk translations > like this, do you just fill visibility with some default value? > > -- Kurt > > > On 7/9/13 1:26 PM, Donald Miner wrote: > >> Has anyone developed tools to migrate data from an existing HBase >> implementation to Accumulo? My team has done it "manually" in the past but >> it seems like it would be reasonable to write a process that handled the >> steps in a more automated fashion. >> >> Here are a few sample designs I've kicked around: >> >> HBase -> mapreduce -> mappers bulk write to accumulo -> Accumulo >> or >> HBase -> mapreduce -> tfiles via AccumuloFileOutputFormat -> Accumulo >> bulk load -> Accumulo >> or >> HBase -> bulk export -> map-only mapreduce to translate hfiles into >> tfiles (how hard would this be??) -> Accumulo bulk load -> Accumulo >> >> I guess this could be extended to go the other way around (and also >> include Cassandra perhaps). >> >> Maybe we'll start working on this soon. I just wanted to kick the idea >> out there to see if it's been done before or if anyone has some gut >> reactions to the process. >> >> -Don >> >> This communication is the property of ClearEdge IT Solutions, LLC and may >> contain confidential and/or privileged information. Any review, >> retransmissions, dissemination or other use of or taking of any action in >> reliance upon this information by persons or entities other than the >> intended recipient is prohibited. If you receive this communication in >> error, please immediately notify the sender and destroy all copies of the >> communication and any attachments. >> > > -- > > Kurt Christensen > P.O. Box 811 > Westminster, MD 21158-0811 > > ------------------------------**------------------------------** > ------------ > I'm not really a trouble maker. I just play one on TV. > -- * *Donald Miner Chief Technology Officer ClearEdge IT Solutions, LLC Cell: 443 799 7807 www.clearedgeit.com -- This communication is the property of ClearEdge IT Solutions, LLC and may contain confidential and/or privileged information. Any review, retransmissions, dissemination or other use of or taking of any action in reliance upon this information by persons or entities other than the intended recipient is prohibited. If you receive this communication in error, please immediately notify the sender and destroy all copies of the communication and any attachments. --089e011774cd5f2ad304e11799d4 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
I did think about this. My naive answer is just by default= ignore visibilities (meaning make everything public or make everything the= same visibility). It would be interesting however to be able to insert a c= hunk of code that inferred the visibility from the record itself. That is, = you'd have a function you can pass in that returns a ColumnVisibility a= nd takes in a value/rowkey/etc.


On Tue,= Jul 9, 2013 at 1:31 PM, Kurt Christensen <hoodel@hoodel.com> wrote:

I don't have a response to your question, but it seems to me that the b= ig capability difference is visibility field. When doing bulk translations = like this, do you just fill visibility with some default value?

-- Kurt


On 7/9/13 1:26 PM, Donald Miner wrote:
Has anyone developed tools to migrate data from an existing HBase implement= ation to Accumulo? My team has done it "manually" in the past but= it seems like it would be reasonable to write a process that handled the s= teps in a more automated fashion.

Here are a few sample designs I've kicked around:

HBase -> mapreduce -> mappers bulk write to accumulo -> Accumulo or
HBase -> mapreduce -> tfiles via AccumuloFileOutputFormat -> Accum= ulo bulk load -> Accumulo
or
HBase -> bulk export -> map-only mapreduce to translate hfiles into t= files (how hard would this be??) -> Accumulo bulk load -> Accumulo
I guess this could be extended to go the other way around (and also include= Cassandra perhaps).

Maybe we'll start working on this soon. I just wanted to kick the idea = out there to see if it's been done before or if anyone has some gut rea= ctions to the process.

-Don

This communication is the property of ClearEdge IT Solutions, LLC and may c= ontain confidential and/or privileged information. Any review, retransmissi= ons, dissemination or other use of or taking of any action in reliance upon= this information by persons or entities other than the intended recipient = is prohibited. If you receive this communication in error, please immediate= ly notify the sender and destroy all copies of the communication and any at= tachments.

--

Kurt Christensen
P.O. Box 811
Westminster, MD 21158-0811

-------------------------------------------------------------= -----------
I'm not really a trouble maker. I just play one on TV.



--

Donald Miner
Chief Technol= ogy Officer
ClearEdge IT Solutions, LLC
Cell: 443 799 780= 7
www.clearedg= eit.com

This communication is the=20 property of ClearEdge IT Solutions, LLC and may contain confidential=20 and/or privileged information. Any review, retransmissions,=20 dissemination or other use of or taking of any action in reliance upon this information by persons or entities other than the intended=20 recipient is prohibited. If you receive this communication in error,=20 please immediately notify the sender and destroy all copies of the=20 communication and any attachments. --089e011774cd5f2ad304e11799d4--