accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Keith Turner <ke...@deenlo.com>
Subject Re: [VOTE] Apache Accumulo 1.6.1 RC1
Date Wed, 24 Sep 2014 14:03:11 GMT
On Wed, Sep 24, 2014 at 12:43 AM, Josh Elser <josh.elser@gmail.com> wrote:

> Well, color me shocked -- the verify found some bad data. It looks
> like two keys have bad checksums (which I assume is what created the
> UNDEFINEDs, too?).
>
> CORRUPT 2
>

oh wow, I have never seen that happen since I added the checksums to CI.
Can you run a memory diagnostic on your workstation?

Should probably review the checksum related code to make sure there are no
errors in it.


> REFERENCED 2199999908
> UNDEFINED 2
> UNREFERENCED 874770
>
> I ran two tabletservers on my desktop, turned on hflush instead of
> hsync, switched from GZ to snappy and upped the splits threshold for
> 4g and let CI run for ~5 hours. I killed the tservers about a dozen
> times by hand throughout the day (kill -9), and the master once or
> twice. The datanode was left alone. This was running on 2.6.0-SNAPSHOT
> from around 9/14/2014.
>
> The offending keys are:
>
> 389a85668b6ebf8e 2ff6:4a78 [] 1411499115242
>
> 3a10885b-d481-4d00-be00-0477e231ey65:000000008576b169:0cd98965c9ccc1d0:ba15529e
>
> and
>
> 7e56b58a0c7df128 5fa0:6249 [] 1411499311578
>
> 3a10885b-d481-4d00-be00-0477e231e965:0000p000872d60eb:499fa72752d82a7c:5c5f19e8
>
> which both happened a little after 3:00pm eastern (I stopped CI around
> 3:30pm eastern). I don't see anything immediately wrong in the tserver
> logs (nor does it appear that I had restarted either of them around
> the timestamp of the above keys). I see no errors in the DN logs
> either around that time window.
>
> I don't have a clue how to even start looking at this to figure out if
>

If you had turned on archiving of walogs, you could look in the walog and
see if the data matches.

You can also see if this data was written around the time of a kill event.
Every CI entry has counter and ingester id.  Using the counter and ingester
ID, you can look in the ingesters log file and find a time range for when
that data was ingested.  Using that info you can determine what tablet it
was written to and where that tablet was assigned at the time.


> something indeed went wrong, or if it's some other sort of issue. To
> be clear, this as it stands isn't sufficient to make me change my
> vote.
>
> On Tue, Sep 23, 2014 at 3:04 PM, Josh Elser <josh.elser@gmail.com> wrote:
> > +1
> >
> > * Verified checksums+sigs
> > * Build from source tarball and ran all unit+functional tests against
> > Apache Hadoop 2.5.1 and 2.6.0-SNAPSHOT
> > * Ingested 2B records w/ CI + clean verify with single tserver (Apache
> > Hadoop 2.6.0-SNAPSHOT + Apache ZooKeeper 3.4.5)
> > * Ingested ~2.5B records w/ CI with 2 tservers and some manual
> > agitation (Apache Hadoop 2.6.0-SNAPSHOT + Apache ZooKeeper 3.4.5)
> >     - Currently running verify, will report if I get a failed verify
> > * Ran some Hive queries (w/ Apache Hive-0.14.0-SNAPSHOT & Apache Tez
> > 0.6.0-SNAPSHOT)
> > * Ran some Pig queries (w/ Apache Pig-0.13.0)
> >
> > Thanks for organizing this, Corey!!
> >
> > On Fri, Sep 19, 2014 at 10:49 PM, Corey Nolet <cjnolet@gmail.com> wrote:
> >> Devs,
> >>
> >> Please consider the following candidate for Apache Accumulo 1.6.1
> >>
> >> Branch: 1.6.1-rc1
> >> SHA1: 88c5473b3b49d797d3dabebd12fe517e9b248ba2
> >> Staging Repository:
> >> *
> https://repository.apache.org/content/repositories/orgapacheaccumulo-1017/
> >> <
> https://repository.apache.org/content/repositories/orgapacheaccumulo-1017/
> >*
> >>
> >> Source tarball:
> >> *
> http://repository.apache.org/content/repositories/orgapacheaccumulo-1017/org/apache/accumulo/accumulo/1.6.1/accumulo-1.6.1-src.tar.gz
> >> <
> http://repository.apache.org/content/repositories/orgapacheaccumulo-1017/org/apache/accumulo/accumulo/1.6.1/accumulo-1.6.1-src.tar.gz
> >*
> >> Binary tarball:
> >> *
> http://repository.apache.org/content/repositories/orgapacheaccumulo-1017/org/apache/accumulo/accumulo/1.6.1/accumulo-1.6.1-bin.tar.gz
> >> <
> http://repository.apache.org/content/repositories/orgapacheaccumulo-1017/org/apache/accumulo/accumulo/1.6.1/accumulo-1.6.1-bin.tar.gz
> >*
> >> (Append ".sha1", ".md5" or ".asc" to download the signature/hash for a
> >> given artifact.)
> >>
> >> Signing keys available at: https://www.apache.org/dist/accumulo/KEYS
> >>
> >> Over 1.6.1, we have 188 issues resolved
> >> *
> https://git-wip-us.apache.org/repos/asf?p=accumulo.git;a=blob;f=CHANGES;h=91b9d31e3b9dc53f1a576cc49bbc061919eb0070;hb=1.6.1-rc1
> >> <
> https://git-wip-us.apache.org/repos/asf?p=accumulo.git;a=blob;f=CHANGES;h=91b9d31e3b9dc53f1a576cc49bbc061919eb0070;hb=1.6.1-rc1
> >*
> >>
> >> Testing: All unit and functional tests are passing.
> >>
> >> Vote will be open until Thursday, September 25th 12:00AM UTC (9/24
> 8:00PM
> >> ET, 9/24 5:00PM PT)
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message