hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cosmin Lehene <cleh...@adobe.com>
Subject MR job "randomly" scans up thousands of rows less than the it should.
Date Thu, 02 Feb 2012 04:46:21 GMT
We have a MR job that runs every few minutes on some time series data which is continuously
updated (never deleted).
Every few (in the range of tens to hundreds) runs the map task that covers the last region
will get fewer input records (off by 500-5000 rows) without any splits happening. This lower
number of input records could persist for a few MR runs, but will eventually get back to the
"correct" value.

This drop can be seen both in the "map input records" metric but it's correlated with the
metrics that get computed by the MR job (so it's not a MR counter bug).

There are no exceptions in the MR job, or in the region server and this doesn't seem to be
correlated with any compaction, split or region movement.
The only "variable" in this scenario is that new data gets injected continuously (and the
actual MR job which is idempotent)

This entire puzzle takes place on  HBase 0.90.5 –ish (12 dec 2011) on top of Hadoop cdh3u2.


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message