accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <josh.el...@gmail.com>
Subject Re: [VOTE] Apache Accumulo 1.7.0-rc3
Date Fri, 15 May 2015 20:49:48 GMT
Great! Please include me in your recoverLease() investigations. I'll try 
to help out however possible.

Eric Newton wrote:
> +1
>
> My RW testing went well enough, and I didn't see failures until the cluster
> hung up.ACCUMULO-3818 is a test issue, and not a problem with accumulo.
>
> I was able to build 1.7.0 from the source tarball.
>
> I'll continue to investigate the recoverLease issue.
>
> -Eric
>
> On Fri, May 15, 2015 at 4:09 PM, Josh Elser<josh.elser@gmail.com>  wrote:
>
>> In some of my testing on HDFS 2.6.0 and agitation, I believe leases
>> weren't being recovered until the datanode was restarted (which agitation
>> killed). Every time I jstack'ed the master, I saw it hung on the HDFS
>> recoverLease() method. I saw nothing that suggested a bug in Accumulo (a
>> nod towards Christopher's assessment).
>>
>>
>> Eric Newton wrote:
>>
>>> Can you quantify "Some lease recoveries taking a long time in HDFS."?
>>>
>>> I think Keith and I are ok with the release. We're looking into our
>>> (shared) testing environment to figure out why Accumulo and HDFS are not
>>> getting along during recoveries.
>>>
>>> -Eric
>>>
>>>
>>> On Fri, May 15, 2015 at 3:51 PM, Josh Elser<josh.elser@gmail.com>   wrote:
>>>
>>>   72hr 3node CI verify w/ agitation just succeeded.
>>>> org.apache.accumulo.test.continuous.ContinuousVerify$Counts
>>>>       REFERENCED=19706560041
>>>>       UNREFERENCED=29844289
>>>>
>>>> Repeatedly ran into issues with HoldTime from HDFS (wrote a loop to just
>>>> restart the ingester). Some lease recoveries taking a long time in HDFS.
>>>> Nothing super impactful.
>>>>
>>>> I believe that wraps up all required testing.
>>>>
>>>>
>>>> Josh Elser wrote:
>>>>
>>>>   24hr CI just verified
>>>>> org.apache.accumulo.test.continuous.ContinuousVerify$Counts
>>>>> REFERENCED=5754999773
>>>>> UNREFERENCED=6000227
>>>>>
>>>>> Josh Elser wrote:
>>>>>
>>>>>   24hr RW on 3 nodes w/ agitation just finished positively.
>>>>>> Josh Elser wrote:
>>>>>>
>>>>>>   Testing: All unit and integration tests are passing. Completed
3-day
>>>>>>> CI
>>>>>>> w/o agitation or verification. Completed 24hr RandomWalk w/o
agitation
>>>>>>> on 3 nodes. 95% through 24hr CI and RW w/ agitation.
>>>>>>>
>>>>>>>
>

Mime
View raw message