hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: MR job "randomly" scans up thousands of rows less than the it should.
Date Fri, 03 Feb 2012 01:03:34 GMT
HBASE-4838 ports HBASE-2856 to 0.92

FYI

On Thu, Feb 2, 2012 at 4:46 PM, Cosmin Lehene <clehene@adobe.com> wrote:

> (sorry for the damaged subject :))
>
>
> Hey Jon,
> We have two column families.
> There are no filters and there's a full table scan. We're not skipping
> rows.
> I did see however a single time that we had one qualifier "fault" in the
> job counters (it was missing, and it wasn't supposed to be missing).
> However that was only once and it doesn't happen when we encounter missing
> rows.
>
> We're getting this behavior consistently although I couldn't figure a way
> to reproduce it. I'll try running multiple instances of the job in
> parallel to figure out if that would affect the outcome.
> I'll probably have to add more debugging for the affected rows and dig
> deeper.
>
> HBASE-2856 is a pretty large issue - do you think it could be related to
> what I'm seeing? If so it could help me reproduce it.
>
> Thanks,
> Cosmin
>
>
>
>
> On 2/1/12 11:30 PM, "Jonathan Hsieh" <jon@cloudera.com> wrote:
>
> >Cosmin,
> >
> >How many column families to you have in this table?   Are you using any
> >filters in you HBase scans?  Are you using skip rows that may not have
> >qualifiers present?
> >
> >There are a few known issues with multi-CF atomicity and a recent one
> >about
> >flushes that may be related to this problem.  There HBASE-2856, a fix
> >having to do with flushes which is pretty intricate and only in 0.92.
> >
> >Jon.
> >
> >On Wed, Feb 1, 2012 at 8:46 PM, Cosmin Lehene <clehene@adobe.com> wrote:
> >
> >> We have a MR job that runs every few minutes on some time series data
> >> which is continuously updated (never deleted).
> >> Every few (in the range of tens to hundreds) runs the map task that
> >>covers
> >> the last region will get fewer input records (off by 500-5000 rows)
> >>without
> >> any splits happening. This lower number of input records could persist
> >>for
> >> a few MR runs, but will eventually get back to the "correct" value.
> >>
> >> This drop can be seen both in the "map input records" metric but it's
> >> correlated with the metrics that get computed by the MR job (so it's
> >>not a
> >> MR counter bug).
> >>
> >> There are no exceptions in the MR job, or in the region server and this
> >> doesn't seem to be correlated with any compaction, split or region
> >>movement.
> >> The only "variable" in this scenario is that new data gets injected
> >> continuously (and the actual MR job which is idempotent)
> >>
> >> This entire puzzle takes place on  HBase 0.90.5 ­ish (12 dec 2011) on
> >>top
> >> of Hadoop cdh3u2.
> >>
> >> Cosmin
> >>
> >>
> >>
> >>
> >
> >
> >--
> >// Jonathan Hsieh (shay)
> >// Software Engineer, Cloudera
> >// jon@cloudera.com
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message