accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bill Havanki <bhava...@clouderagovt.com>
Subject Re: [VOTE] Apache Accumulo 1.6.1 RC1
Date Wed, 24 Sep 2014 16:58:16 GMT
+1

- MD5 and SHA1 checksums verified
- signatures verified
- unit tests pass
- integration tests pass except one (see below)
- recursive MD5 of all files pass
- two one-hour CI runs (one with agitation, one without) verified

I consistently get a timeout failure from DeleteTableDuringSplitIT, but it
doesn't seem others get that so I'm not overly concerned. I also eagerly
await the findings from Josh's failed CI verification.

Cluster details for CI tests:
- three nodes (one master/worker, two workers)
- CDH 5.1.2 hosting HDFS (2.3.0-cdh5.1.2), YARN (ditto), ZK (3.4.5-cdh5.1.2)
- CentOS 6.4 running on VMs (4 CPUs, 10G RAM)

On Wed, Sep 24, 2014 at 12:11 PM, Josh Elser <josh.elser@gmail.com> wrote:

> Keith Turner wrote:
>
>> 7e56b58a0c7df128 5fa0:6249 [] 1411499311578
>>> >
>>> >  3a10885b-d481-4d00-be00-0477e231e965:0000p000872d60eb:
>>> 499fa72752d82a7c:5c5f19e8
>>> >
>>> >  which both happened a little after 3:00pm eastern (I stopped CI around
>>> >  3:30pm eastern). I don't see anything immediately wrong in the tserver
>>> >  logs (nor does it appear that I had restarted either of them around
>>> >  the timestamp of the above keys). I see no errors in the DN logs
>>> >  either around that time window.
>>> >
>>> >  I don't have a clue how to even start looking at this to figure out if
>>> >
>>>
>>
>> If you had turned on archiving of walogs, you could look in the walog and
>> see if the data matches.
>>
>> You can also see if this data was written around the time of a kill event.
>> Every CI entry has counter and ingester id.  Using the counter and
>> ingester
>> ID, you can look in the ingesters log file and find a time range for when
>> that data was ingested.  Using that info you can determine what tablet it
>> was written to and where that tablet was assigned at the time.
>>
>>
> If I can't find any other reason that might have caused the failure, I'll
> have to re-run with walog archiving turned on.
>
> I checked the tserver logs and neither were killed around the time the
> anomalies occurred.
>



-- 
// Bill Havanki
// Solutions Architect, Cloudera Govt Solutions
// 443.686.9283

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message