mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <sro...@gmail.com>
Subject Re: Random Errors
Date Fri, 07 Jun 2013 12:51:37 GMT
Having looked at it recently -- no the parallelism is per-class, just
for this reason.

I suspect the problem is a race condition vis-a-vis HDFS. Usually some
operate like a delete is visible a moment later when a job starts, but
maybe not always. It could also be some internal source of randomness
somewhere in a library that can't be controlled externally, but I find
that an unlikely explanation for this.

On Fri, Jun 7, 2013 at 1:03 PM, Sebastian Schelter
<ssc.open@googlemail.com> wrote:
> I'm also getting errors on a test when executing all tests. Don't get
> the error when I run the test in the IDE or via mvn on the commandline.
>
> Do we now also have intra-test class parallelism? If yes, is there a way
> to disable this?
>
> --sebastian
>
>
> On 07.06.2013 09:11, Ted Dunning wrote:
>> This last one is actually more like a non-deterministic test that probably
>> needs a restart strategy to radically decrease the probability of failure
>> or needs a slightly more relaxed threshold.
>>
>>
>>
>> On Fri, Jun 7, 2013 at 7:32 AM, Grant Ingersoll <gsingers@apache.org> wrote:
>>
>>> Here's another one:
>>> testClustering(org.apache.mahout.clustering.streaming.cluster.BallKMeansTest)
>>>  Time elapsed: 2.817 sec  <<< FAILURE!
>>> java.lang.AssertionError: expected:<625.0> but was:<753.0>
>>>         at org.junit.Assert.fail(Assert.java:88)
>>>         at org.junit.Assert.failNotEquals(Assert.java:743)
>>>         at org.junit.Assert.assertEquals(Assert.java:494)
>>>         at org.junit.Assert.assertEquals(Assert.java:592)
>>>         at
>>> org.apache.mahout.clustering.streaming.cluster.BallKMeansTest.testClustering(BallKMeansTest.java:119)
>>>
>>>
>>> I suspect that we still have issues w/ the parallel testing, as it doesn't
>>> show up in repeated runs and it isn't consistent.
>>>
>>> On Jun 7, 2013, at 6:10 AM, Grant Ingersoll <gsingers@apache.org> wrote:
>>>
>>>> testTranspose(org.apache.mahout.math.hadoop.TestDistributedRowMatrix)
>>>  Time elapsed: 1.569 sec  <<< ERROR!
>>>> org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory
>>> file:/tmp/mahout-TestDistributedRowMatrix-8146721276637462528/testdata/transpose-24
>>> already exists
>>>>       at
>>> org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:121)
>>>>       at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:951)
>>>>       at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:912)
>>>>       at java.security.AccessController.doPrivileged(Native Method)
>>>>       at javax.security.auth.Subject.doAs(Subject.java:396)
>>>>       at
>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
>>>>       at
>>> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:912)
>>>>       at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:886)
>>>>       at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1323)
>>>>       at
>>> org.apache.mahout.math.hadoop.DistributedRowMatrix.transpose(DistributedRowMatrix.java:238)
>>>>       at
>>> org.apache.mahout.math.hadoop.TestDistributedRowMatrix.testTranspose(TestDistributedRowMatrix.java:87)
>>>>
>>>>
>>>> Anyone seen this?  I'm guessing there are some conflicts due to order
>>> methods are run in.
>>>
>>> --------------------------------------------
>>> Grant Ingersoll | @gsingers
>>> http://www.lucidworks.com
>>>
>>>
>>>
>>>
>>>
>>>
>>
>

Mime
View raw message