hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tatsuya Kawano (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-2269) PerformanceEvaluation "--nomapred" may assign duplicate random seed over multiple testing threads
Date Sat, 06 Mar 2010 23:22:27 GMT

    [ https://issues.apache.org/jira/browse/HBASE-2269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12842341#action_12842341
] 

Tatsuya Kawano commented on HBASE-2269:
---------------------------------------

@Stack At least Sun JDK, yes, we can expect {{super.hashCode() (java.lang.Object#hashCode()
)}} to return different integers per instance of Test.

Quote from JDK 6 Javadoc: 
http://java.sun.com/javase/6/docs/api/java/lang/Object.html#hashCode()

{quote}
As much as is reasonably practical, the hashCode method defined by class Object does return
distinct integers for distinct objects. (This is typically implemented by converting the internal
address of the object into an integer, but this implementation technique is not required by
the Java programming language.)
{quote}

There is no chance of different objects on the same Java VM getting the same internal address.

There will be a very little chance of different objects on different Java VMs getting the
same internal address. 


Or, if you'd like to avoid this implementation specific behavior, we could do something like:


{code:title=In Test class}
private static final Random randomSeed = new Random(System.currentTimeMillis());
private static long nextRandomSeed() { return randomSeed.nextLong(); }
protected final Random rand = new Random(nextRandomSeed());
{code}

We can expect different map tasks to get different currentTimeMillis() and different threads
to get different nextLong() values. 


Also, {{java.security.SecureRandom}} has {{getSeed(int numBytes)}} method, and Linux version
will return perfectly random values based on environmental noise collected by OS. However,
I don't recommend this because it uses Linux {{/dev/random}} as the source and gets blocked
when {{/dev/random}} runs out the data in its entropy pool. ( http://en.wikipedia.org/wiki//dev/random
)   We don't need this level of strength anyway. 

Thanks, 


> PerformanceEvaluation "--nomapred" may assign duplicate random seed over multiple testing
threads
> -------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-2269
>                 URL: https://issues.apache.org/jira/browse/HBASE-2269
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: test
>    Affects Versions: 0.20.3
>         Environment: Any operating system
>            Reporter: Tatsuya Kawano
>            Priority: Minor
>
> When you use PerformanceEvaluation with "--nomapred" option, you will end up having the
same random seeds assigned over multiple testing threads. So you'll get inaccurate results
from "random~~" tests.
> {code:title=PerformanceEvaluation.java}
> 542:  abstract class Test {
> 543:     protected final Random rand = new Random(System.currentTimeMillis());
> {code}
> Milliseconds won't be sufficient; today's JVM is much faster to create multiple Test
objects in one millisecond. You might want to use something like "{{super.hashCode()}}" instead.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message