hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Marc Spaggiari <jean-m...@spaggiari.org>
Subject Re: How to verify data from MySQL and HBase
Date Tue, 12 Aug 2014 12:45:52 GMT
Hi Tobe,

Thing is, your data in HBase might be organize very differently than in
MySQL. Have you denormlized some of it? Have you used some Avro containers
into an HBase cell? Have you do any cleanup? Or enrichment? At the end, it
my be very different that what is stored into MySQL. There is not any easy
process to validate that migration as been done correctly between the two
databases because tools don't know the transformations you applied. I think
you will have to build someone internally which will do this validation
because not such tool exist today.

You can use the row counters to count the rows, but will that mean content
is correct? No. Calculate a CRC on the cells value? Not even, because you
might have denormalized.

Sorry, but you will have to do some coding here I think.



2014-08-12 3:49 GMT-04:00 tobe <tobeg3oogle@gmail.com>:

> Thanks for replaying. @Serega
> I know MySQL and HBase are reliable. What I want to validate is the date in
> both MySQL and HBase. The upper-stream application and unexpected operation
> may make it inconsistent.
> I'm also wondering could sqoop validate the values from each database?
> There's RowCountValidator but it's not suitable for us.
> On Tue, Aug 12, 2014 at 3:23 PM, Serega Sheypak <serega.sheypak@gmail.com>
> wrote:
> > you should design resilient ETL-processes. Also introduce post-ETL
> checks.
> > There is no need to test MySQL or HBase. They are already tested.
> > See this:
> >
> http://www.slideshare.net/wyaddow/data-verification-in-qa-department-final
> > Pretty old, but gives basic ideas. Nothing changed from that time.
> >
> >
> > 2014-08-12 7:55 GMT+04:00 tobe <tobeg3oogle@gmail.com>:
> >
> > > Most of our users migrated their data form MySQL to HBase. Before they
> > > totally trust HBase, they use MySQL and HBase at the same time.
> Sometimes
> > > the data is inconsistent because they use it incorrectly or maybe
> > there're
> > > bugs of HBase. Anyway, we have to make sure the data from MySQL and
> HBase
> > > is consistent.
> > >
> > > So how can we do that? Write a simple script or is there any general
> > > method?
> > >
> >

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message