commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gilles (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (RNG-54) StringSampler
Date Thu, 16 Aug 2018 21:00:00 GMT

    [ https://issues.apache.org/jira/browse/RNG-54?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16583056#comment-16583056
] 

Gilles commented on RNG-54:
---------------------------

{quote}matched by using Commons Codec and so if anything the nextHexString method should probably
be added to that library.
{quote}
Not necessarily. Where it might end up depends on several factors, like the avoidance of dependencies.
This even sometimes warrants functionality duplication among Commons components. Thus, "Commons
Text" might be interested, the more so that there had been a long discussion to arrive at
the API that allows direct use of "Commons RNG" without introducing a dependency.

You should post to the ML.

> StringSampler
> -------------
>
>                 Key: RNG-54
>                 URL: https://issues.apache.org/jira/browse/RNG-54
>             Project: Commons RNG
>          Issue Type: Improvement
>          Components: sampling
>    Affects Versions: 1.1
>            Reporter: Alex D Herbert
>            Priority: Minor
>
> There is currently no equivalent for the function {{org.apache.commons.math3.random.RandomDataGenerator.nextHexString(int)}}.
> Here is the original version adapted to use the {{UniformRandomProvider:}}
> {code:java}
> public String nextHexStringOriginal(UniformRandomProvider ran, int len) {
>     // Initialize output buffer
>     StringBuilder outBuffer = new StringBuilder();
>     // Get int(len/2)+1 random bytes
>     byte[] randomBytes = new byte[(len / 2) + 1];
>     ran.nextBytes(randomBytes);
>     // Convert each byte to 2 hex digits
>     for (int i = 0; i < randomBytes.length; i++) {
>         Integer c = Integer.valueOf(randomBytes[i]);
>         /*
>          * Add 128 to byte value to make interval 0-255 before doing hex conversion.
>          * This guarantees <= 2 hex digits from toHexString() toHexString would
>          * otherwise add 2^32 to negative arguments.
>          */
>         String hex = Integer.toHexString(c.intValue() + 128);
>         // Make sure we add 2 hex digits for each byte
>         if (hex.length() == 1) {
>             hex = "0" + hex;
>         }
>         outBuffer.append(hex);
>     }
>     return outBuffer.toString().substring(0, len);
> }
> {code}
> Note: I removed the length check to make the speed test (see below) fair.
> This makes use of {{StringBuider}} and is not very efficient. I have created a version
based on the Hex encoding within {{org.apache.commons.codec.digest.DigestUtils}} and {{org.apache.commons.codec.binary.Hex}}.
This uses a direct look-up of the hex character using the index from successive 4 bits of
a byte array to form an index from 0-15.
> Here's the function without details of how the {{byte[]}} is correctly sized:
> {code:java}
> private static final char[] DIGITS_LOWER = { 
>     '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'a', 'b', 'c', 'd', 'e', 'f' };
> private static String nextHexString(UniformRandomProvider rng, byte[] bytes, int length)
{
>     rng.nextBytes(bytes);
>     // Use the upper and lower 4 bits of each byte as an
>     // index in the range 0-15 for each hex character.
>     final char[] out = new char[length];
>     // Run the loop without checking index j by producing characters
>     // up to the size below the desired length.
>     final int loopLimit = length / 2;
>     int i = 0, j = 0;
>     while (i < loopLimit) {
>         final byte b = bytes[i];
>         // 0x0F == 0x01 | 0x02 | 0x04 | 0x08
>         out[j++] = DIGITS_LOWER[(b >>> 4) & 0x0F];
>         out[j++] = DIGITS_LOWER[b & 0x0F];
>         i++;
>     }
>     // The final character
>     if (j < length)
>         out[j++] = DIGITS_LOWER[(bytes[i] >>> 4) & 0x0F];
>     return new String(out);
> }
> {code}
> I've compared this to the original function and a modified one below that computes the
exact same strings:
> {code:java}
> public String nextHexStringModified(UniformRandomProvider ran, int len) {
>     // Initialize output buffer
>     StringBuilder outBuffer = new StringBuilder();
>     // byte[] randomBytes = new byte[(len/2) + 1]; // ORIGINAL
>     byte[] randomBytes = new byte[(len + 1) / 2];
>     ran.nextBytes(randomBytes);
>     // Convert each byte to 2 hex digits
>     for (int i = 0; i < randomBytes.length; i++) {
>         // ORIGINAL
>         // Integer c = Integer.valueOf(randomBytes[i]);
>         // String hex = Integer.toHexString(c.intValue() + 128);
>         String hex = Integer.toHexString(randomBytes[i] & 0xff);
>         // Make sure we add 2 hex digits for each byte
>         if (hex.length() == 1) {
>             outBuffer.append('0');
>         }
>         outBuffer.append(hex);
>     }
>     return outBuffer.toString().substring(0, len);
> }
> {code}
> The timings are:
>  
> ||Name||Time||Relative||
> |StringSampler|316103|0.073|
> |nextHexStringModified|3708104|0.853|
> |nextHexStringOriginal|4348063|1.000|
> This is not using JMH but the results show the method performs better.
> The full {{StringSampler}} class supports a radix of 2, 8, and 16 for binary, octal and
hex strings.
> JUnit tests show: the sampler computes the same values as {{nextHexStringModified(int);}} edges
cases are handled with exceptions; and the output strings are uniform for each of the supported
character sets (using a Chi Squared test).
> Can I create a PR for a {{org.apache.commons.rng.sampling.StringSampler}}?
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message