hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: Seeking advice on skipped/lost data during data migration from and to a hbase table
Date Sun, 05 Feb 2017 20:36:05 GMT
Which release of hbase are you using ?

To be specific, does the release have HBASE-15378 ?


On Sun, Feb 5, 2017 at 11:32 AM, Alexandre Normand <
alexandre.normand@gmail.com> wrote:

> We're migrating data from a previous iteration of a table to a new one and
> this process involved a MR job that scans data from the source table and
> writes the equivalent data in the new table. The source table has 6000+
> regions and it frequently splits because we're still ingesting time series
> data into it. We used buffered writing on the other end when writing to the
> new table and we have a yarn resource pool to limit the concurrent writing.
> First, I should say that this job took a long time but still mostly worked.
> However, we've built a mechanism to compare requested data fetched from
> each one of the tables and found that some rows (0.02%) are missing from
> the destination. We've ruled out a few things already:
> * Functional bug in the job that would have resulted in skipping that 0.02%
> of the rows.
> * Potential for that data not having existed when the migration job
> initially ran.
> At a high-level, the suspects could be:
> * The source table splitting could have resulted in some input keys not
> being read. However, since a hbase split is comprised of a startKey/endKey,
> this seems like this would not be expected unless there was a bug in there
> somehow.
> * The writing/flushing losing a batch. Since we're buffering writes and
> flush everything on the clean up of map tasks, we would expect write
> failures to cause task failures/retries and therefore to not be a problem
> in the end. Given that this flush is synchronous and, according to our
> understanding, completes when the data is in the WAL and memstore, this
> also seems unlikely unless there's a bug.
> I should add that we've extracted a sample of 1% of the source rows (doing
> all of them is really time consuming because of the size of data) and found
> that missing data often appears in clusters of the source hbase row keys.
> This doesn't really help pointing at a problem with the scan side of things
> or the write side of things (since a failure in either would result in a
> similar output) but we thought it was interesting. That said, we do have a
> few keys that are missing that aren't clustered. This could be because
> we've only ran the comparison for 1% of the data or it could be that
> whatever is causing this can affect very isolated cases.
> We're now trying to understand how this could have happened in order to
> understand how it could impact other jobs/applications and also to increase
> our confidence that we write a modified version of the migration job to
> re-migrate the skipped/missing data.
> Any ideas or advice would be much appreciated.
> Thanks!
> --
> Alex

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message