hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Patrick Schless <patrick.schl...@gmail.com>
Subject Re: Replication - some timestamps off by 1 ms
Date Thu, 11 Jul 2013 21:36:48 GMT
Interesting (thanks for the info). I don't suppose there's an easy way to
filter those incremented cells out, so the response from verifyRep is
meaningful? :)


On Thu, Jul 11, 2013 at 3:44 PM, Jean-Daniel Cryans <jdcryans@apache.org>wrote:

> Yeah increments won't work. I guess the warning isn't really visible
> but one place you can see it is:
>
> $ ./bin/hadoop jar ../hbase/hbase.jar
> An example program must be given as the first argument.
> Valid program names are:
>   CellCounter: Count cells in HBase table
>   completebulkload: Complete a bulk data load.
>   copytable: Export a table from local cluster to peer cluster
>   export: Write table data to HDFS.
>   import: Import data written by Export.
>   importtsv: Import data in TSV format.
>   rowcounter: Count rows in HBase table
> vvvv
>   verifyrep: Compare the data from tables in two different clusters.
> WARNING: It doesn't work for incrementColumnValues'd cells since the
> timestamp is changed after being appended to the log.
> ^^^^
>
> The problem is that increments' timestamps are different in the WAL
> and in the final KV that's stored in HBase.
>
> J-D
>
> On Thu, Jul 11, 2013 at 12:19 PM, Patrick Schless
> <patrick.schless@gmail.com> wrote:
> > It's possible, but I'm not sure. This is a live system, and we do use
> > increment, and it's a smaller portion of our writes into HBase. I can try
> > to duplicate it, but I can't say how these specific cells got written.
> >
> > Would incremented cells not get replicated correctly?
> >
> >
> > On Thu, Jul 11, 2013 at 12:53 PM, Jean-Daniel Cryans <
> jdcryans@apache.org>wrote:
> >
> >> Are those incremented cells?
> >>
> >> J-D
> >>
> >> On Thu, Jul 11, 2013 at 10:23 AM, Patrick Schless
> >> <patrick.schless@gmail.com> wrote:
> >> > I have had replication running for about a week now, and have had a
> lot
> >> of
> >> > data flowing to our slave cluster over that time. Now, I'm running the
> >> > verifyrep MR job over a 1-hour period a couple days ago (which should
> be
> >> > fully replicated), and I'm seeing a small number of "BADROWS".
> >> > Spot-checking a few of them, the issue seems to be that the rows are
> >> > present, and have the same values, but a single cell in the row will
> be
> >> off
> >> > by 1ms.
> >> >
> >> > For instance, the log reports this error:
> >> > java.lang.Exception: This result was different:
> >> >
> >>
> keyvalues={01e581745c6a43aba01adf105af4e4a92013071015/data:!\xDF\xE0\x01/1373470622986/Put/vlen=8,
> >> >
> >>
> 01e581745c6a43aba01adf105af4e4a92013071015/data:&s\xC0\x01/1373470923084/Put/vlen=8,
> >> >
> >>
> 01e581745c6a43aba01adf105af4e4a92013071015/data:+\x07\xA0\x01/1373471223717/Put/vlen=8,
> >> >
> >>
> 01e581745c6a43aba01adf105af4e4a92013071015/data:/\x9B\x80\x01/1373471523316/Put/vlen=8,
> >> >
> >>
> 01e581745c6a43aba01adf105af4e4a92013071015/data:4/`\x01/1373471822913/Put/vlen=8}
> >> > compared to
> >> >
> >>
> keyvalues={01e581745c6a43aba01adf105af4e4a92013071015/data:!\xDF\xE0\x01/1373470622986/Put/vlen=8,
> >> >
> >>
> 01e581745c6a43aba01adf105af4e4a92013071015/data:&s\xC0\x01/1373470923084/Put/vlen=8,
> >> >
> >>
> 01e581745c6a43aba01adf105af4e4a92013071015/data:+\x07\xA0\x01/1373471223716/Put/vlen=8,
> >> >
> >>
> 01e581745c6a43aba01adf105af4e4a92013071015/data:/\x9B\x80\x01/1373471523316/Put/vlen=8,
> >> >
> >>
> 01e581745c6a43aba01adf105af4e4a92013071015/data:4/`\x01/1373471822913/Put/vlen=8}
> >> >
> >> > Some diffing reduces the issue down to:
> >> >
> >>
> 01e581745c6a43aba01adf105af4e4a92013071015/data:+\x07\xA0\x01/1373471223717/Put/vlen=8
> >> > compared to
> >> >
> >>
> 01e581745c6a43aba01adf105af4e4a92013071015/data:+\x07\xA0\x01/1373471223716/Put/vlen=8.
> >> >
> >> > I'm assuming that the value before "/Put" is the cell's timestamp,
> which
> >> > means that the copies are off by 1ms.
> >> >
> >> > Any idea what could cause this? So far (the job is still running), the
> >> > problem seems rare (about 0.05% of rows).
> >> >
> >> > Thanks,
> >> > Patrick
> >>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message