commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex D Herbert (JIRA)" <j...@apache.org>
Subject [jira] [Created] (RNG-54) StringSampler
Date Wed, 15 Aug 2018 13:37:00 GMT
Alex D Herbert created RNG-54:
---------------------------------

             Summary: StringSampler
                 Key: RNG-54
                 URL: https://issues.apache.org/jira/browse/RNG-54
             Project: Commons RNG
          Issue Type: Improvement
          Components: sampling
    Affects Versions: 1.1
            Reporter: Alex D Herbert


There is currently no equivalent for the function {{org.apache.commons.math3.random.RandomDataGenerator.nextHexString(int)}}.

Here is the original version adapted to use the {{UniformRandomProvider:}}
{code:java}
public String nextHexStringOriginal(UniformRandomProvider ran, int len) {

    // Initialize output buffer
    StringBuilder outBuffer = new StringBuilder();

    // Get int(len/2)+1 random bytes
    byte[] randomBytes = new byte[(len / 2) + 1];
    ran.nextBytes(randomBytes);

    // Convert each byte to 2 hex digits
    for (int i = 0; i < randomBytes.length; i++) {
        Integer c = Integer.valueOf(randomBytes[i]);

        /*
         * Add 128 to byte value to make interval 0-255 before doing hex conversion.
         * This guarantees <= 2 hex digits from toHexString() toHexString would
         * otherwise add 2^32 to negative arguments.
         */
        String hex = Integer.toHexString(c.intValue() + 128);

        // Make sure we add 2 hex digits for each byte
        if (hex.length() == 1) {
            hex = "0" + hex;
        }
        outBuffer.append(hex);
    }
    return outBuffer.toString().substring(0, len);
}
{code}
Note: I removed the length check to make the speed test (see below) fair.

This makes use of {{StringBuider}} and is not very efficient. I have created a version based
on the Hex encoding within {{org.apache.commons.codec.digest.DigestUtils}} and {{org.apache.commons.codec.binary.Hex}}.
This uses a direct look-up of the hex character using the index from successive 4 bits of
a byte array to form an index from 0-15.

Here's the function without details of how the {{byte[]}} is correctly sized:
{code:java}
private static final char[] DIGITS_LOWER = { 
    '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'a', 'b', 'c', 'd', 'e', 'f' };

private static String nextHexString(UniformRandomProvider rng, byte[] bytes, int length) {
    rng.nextBytes(bytes);
    // Use the upper and lower 4 bits of each byte as an
    // index in the range 0-15 for each hex character.
    final char[] out = new char[length];
    // Run the loop without checking index j by producing characters
    // up to the size below the desired length.
    final int loopLimit = length / 2;
    int i = 0, j = 0;
    while (i < loopLimit) {
        final byte b = bytes[i];
        // 0x0F == 0x01 | 0x02 | 0x04 | 0x08
        out[j++] = DIGITS_LOWER[(b >>> 4) & 0x0F];
        out[j++] = DIGITS_LOWER[b & 0x0F];
        i++;
    }
    // The final character
    if (j < length)
        out[j++] = DIGITS_LOWER[(bytes[i] >>> 4) & 0x0F];
    return new String(out);
}
{code}
I've compared this to the original function and a modified one below that computes the exact
same strings:
{code:java}
public String nextHexStringModified(UniformRandomProvider ran, int len) {

    // Initialize output buffer
    StringBuilder outBuffer = new StringBuilder();

    // byte[] randomBytes = new byte[(len/2) + 1]; // ORIGINAL
    byte[] randomBytes = new byte[(len + 1) / 2];
    ran.nextBytes(randomBytes);

    // Convert each byte to 2 hex digits
    for (int i = 0; i < randomBytes.length; i++) {

        // ORIGINAL
        // Integer c = Integer.valueOf(randomBytes[i]);
        // String hex = Integer.toHexString(c.intValue() + 128);

        String hex = Integer.toHexString(randomBytes[i] & 0xff);

        // Make sure we add 2 hex digits for each byte
        if (hex.length() == 1) {
            outBuffer.append('0');
        }
        outBuffer.append(hex);
    }
    return outBuffer.toString().substring(0, len);
}
{code}
The timings are:

 
||Name||Time||Relative||
|StringSampler|316103|0.073|
|nextHexStringModified|3708104|0.853|
|nextHexStringOriginal|4348063|1.000|

This is not using JMH but the results show the method performs better.

The full {{StringSampler}} class supports a radix of 2, 8, and 16 for binary, octal and hex
strings.

JUnit tests show: the sampler computes the same values as {{nextHexStringModified(int);}} edges
cases are handled with exceptions; and the output strings are uniform for each of the supported
character sets (using a Chi Squared test).

Can I create a PR for a {{org.apache.commons.rng.sampling.StringSampler}}?

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message