Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 775A11127C for ; Wed, 13 Aug 2014 05:15:00 +0000 (UTC) Received: (qmail 15686 invoked by uid 500); 13 Aug 2014 05:14:58 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 15618 invoked by uid 500); 13 Aug 2014 05:14:58 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 15605 invoked by uid 99); 13 Aug 2014 05:14:58 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 13 Aug 2014 05:14:58 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of lhofhansl@yahoo.com designates 98.139.213.144 as permitted sender) Received: from [98.139.213.144] (HELO nm29-vm1.bullet.mail.bf1.yahoo.com) (98.139.213.144) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 13 Aug 2014 05:14:31 +0000 Received: from [66.196.81.171] by nm29.bullet.mail.bf1.yahoo.com with NNFMP; 13 Aug 2014 05:14:29 -0000 Received: from [98.139.212.192] by tm17.bullet.mail.bf1.yahoo.com with NNFMP; 13 Aug 2014 05:14:28 -0000 Received: from [127.0.0.1] by omp1001.mail.bf1.yahoo.com with NNFMP; 13 Aug 2014 05:14:28 -0000 X-Yahoo-Newman-Property: ymail-3 X-Yahoo-Newman-Id: 351186.62013.bm@omp1001.mail.bf1.yahoo.com Received: (qmail 61024 invoked by uid 60001); 13 Aug 2014 05:14:28 -0000 X-YMail-OSG: lGy6e.0VM1l.KoOlAgmM48WKO56ftOlwOzUD_WMAO.tpTGW do4DkosmXpk_qhVwrq4I9Tv97Ake1tjK2WkZUWBYEqbs0W6Va1HwozPC0Eow Kn16BmPKE_Gwuu4cAzAhElDTue5VE5R37WHdPo4lTnEfddCRjcXtuRorlor9 x1SrxpooY1vwGCUYL5vtlX6hCoXq.JiE6fBj8manq132LiyHj1ij7JYkpqol lCF5Fyws9_A8AtHP5V5WTPBUTJ76Jl.LsmDQM.4qWtR1Aolu9BRGnCZAvLXY LvViqCXmaP56fxxu0xqQr_9ry_hpDvayVDwS3yuBs4ryYUvkjdxyj1d_cghc MkVPeLB8n6cVcswNXD8.O6iU1bnbWkkq5cV5bB5OjOkJBsc_Q5o1gpPTuB4Z REMSv4xuUkCD8BZbCY4qMtIq1V4TfLe5_3tikxCaPocVmlIZXEIaX.qUUdvL FOaielvzFYGs9XSVvrvzHQTCwUgKYTd74vrxeQ0uIzroz1lVCHzmXy1hABbS eCeOUw3N1opsTeY1vfh4.VsxJDBlM.DH_Gk6r1Br9H42BpZHxAk0rkjmc2fW r0JngdByVp8L6zTkPBonZtO7zQVjk9Ci9tSe8EXKG2wRVYkjoqZmQ4wIHdAg Q0FNSRDZWty4q2ZepOIKNqEX9NCQ3h7wsXv0DFdj_aZWRkfk2IsLS561oSBj YyK4- Received: from [24.4.160.78] by web140606.mail.bf1.yahoo.com via HTTP; Tue, 12 Aug 2014 22:14:28 PDT X-Rocket-MIMEInfo: 002.001,SnVzdCBpbiB0aGUgaW50ZXJlc3Qgb2Ygc3RhdGluZyB0aGUgb2J2aW91czogRG9uJ3Qgd3JpdGUgYSB0b29sIHRoYXQgc2NhbnMgdGhyb3VnaCBhbGwgdGhlIGRhdGEgaW4gTXlTUUwgKG9yIEhCYXNlKSBhbmQgdGhlbiBsb29rcyB1cCBlYWNoIGluZGl2aWR1YWwgcm93IChvciBldmVuIGJhdGNoZXMgb2Ygcm93cykgaW4gdGhlIG90aGVyIHN0b3JlLiBUaGF0IGlzIHZlcnkgaW5lZmZpY2llbnQgaWYgeW91IGhhdmUgYSBsb3Qgb2YgZGF0YS4KCkRvIGl0IGxpa2UgYSBtZXJnZS1qb2luIGluc3RlYWQ6IEdldCABMAEBAQE- X-RocketYMMF: lhofhansl X-Mailer: YahooMailWebService/0.8.201.700 References: Message-ID: <1407906868.81645.YahooMailNeo@web140606.mail.bf1.yahoo.com> Date: Tue, 12 Aug 2014 22:14:28 -0700 From: lars hofhansl Reply-To: lars hofhansl Subject: Re: How to verify data from MySQL and HBase To: "user@hbase.apache.org" In-Reply-To: MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="1905101558-1340477044-1407906868=:81645" X-Virus-Checked: Checked by ClamAV on apache.org --1905101558-1340477044-1407906868=:81645 Content-Type: text/plain; charset=us-ascii Just in the interest of stating the obvious: Don't write a tool that scans through all the data in MySQL (or HBase) and then looks up each individual row (or even batches of rows) in the other store. That is very inefficient if you have a lot of data. Do it like a merge-join instead: Get sorted results from MySQL such that they sort in the same order as the HBase key. Then read through those results and the HBase scan at the same time, advancing both sides together. That way you need only a single scan on both sides (per table). -- Lars ________________________________ From: tobe To: "user@hbase.apache.org" Sent: Tuesday, August 12, 2014 7:01 AM Subject: Re: How to verify data from MySQL and HBase Thank @JM for the detailed explanation. I totally agree with you and we're developing an internal tool to do it. It's not so general because we have to write the sql and generate the row key manually. But it works and easy to understand. I would like to share with anybody who also needs it. On Tue, Aug 12, 2014 at 8:45 PM, Jean-Marc Spaggiari < jean-marc@spaggiari.org> wrote: > Hi Tobe, > > Thing is, your data in HBase might be organize very differently than in > MySQL. Have you denormlized some of it? Have you used some Avro containers > into an HBase cell? Have you do any cleanup? Or enrichment? At the end, it > my be very different that what is stored into MySQL. There is not any easy > process to validate that migration as been done correctly between the two > databases because tools don't know the transformations you applied. I think > you will have to build someone internally which will do this validation > because not such tool exist today. > > You can use the row counters to count the rows, but will that mean content > is correct? No. Calculate a CRC on the cells value? Not even, because you > might have denormalized. > > Sorry, but you will have to do some coding here I think. > > JM > > JM > > > 2014-08-12 3:49 GMT-04:00 tobe : > > > Thanks for replaying. @Serega > > > > I know MySQL and HBase are reliable. What I want to validate is the date > in > > both MySQL and HBase. The upper-stream application and unexpected > operation > > may make it inconsistent. > > > > I'm also wondering could sqoop validate the values from each database? > > There's RowCountValidator but it's not suitable for us. > > > > > > On Tue, Aug 12, 2014 at 3:23 PM, Serega Sheypak < > serega.sheypak@gmail.com> > > wrote: > > > > > you should design resilient ETL-processes. Also introduce post-ETL > > checks. > > > There is no need to test MySQL or HBase. They are already tested. > > > See this: > > > > > > http://www.slideshare.net/wyaddow/data-verification-in-qa-department-final > > > Pretty old, but gives basic ideas. Nothing changed from that time. > > > > > > > > > 2014-08-12 7:55 GMT+04:00 tobe : > > > > > > > Most of our users migrated their data form MySQL to HBase. Before > they > > > > totally trust HBase, they use MySQL and HBase at the same time. > > Sometimes > > > > the data is inconsistent because they use it incorrectly or maybe > > > there're > > > > bugs of HBase. Anyway, we have to make sure the data from MySQL and > > HBase > > > > is consistent. > > > > > > > > So how can we do that? Write a simple script or is there any general > > > > method? > > > > > > > > > > --1905101558-1340477044-1407906868=:81645--