datafu-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matthew Hayes (JIRA)" <j...@apache.org>
Subject [jira] [Closed] (DATAFU-46) Hash UDFs should return zero-padded strings of uniform length even when leading bits are zero
Date Mon, 02 Nov 2015 17:58:27 GMT

     [ https://issues.apache.org/jira/browse/DATAFU-46?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Matthew Hayes closed DATAFU-46.
-------------------------------

> Hash UDFs should return zero-padded strings of uniform length even when leading bits
are zero
> ---------------------------------------------------------------------------------------------
>
>                 Key: DATAFU-46
>                 URL: https://issues.apache.org/jira/browse/DATAFU-46
>             Project: DataFu
>          Issue Type: Bug
>            Reporter: Matthew Hayes
>            Assignee: Philip (flip) Kromer
>             Fix For: 1.3.0
>
>         Attachments: 0001-Hash-UDFs-return-zero-padded-strings-of-uniform-leng.patch
>
>
> Reported by Philip Kromer here:
> https://github.com/linkedin/datafu/issues/93
> Details reported there by Philip:
> ---------------------
> The Hash UDFs in 'hex' mode currently do not return always the same-length string, because
BigInteger.toString() omits leading zeros. So amidst a stream of 94% strings the same length,
1/16th are shorter by one or more characters, 1/256th by two or more, and in the unlikely
case that an MD5 hash's value was 124 bits of zeros and 4 bits of ones it would return the
one-character-long string 'f'.
> This is surprising behavior, and a trap for those practicing the frequent trick of generating
a hash and chopping off just the number of bits you need:
> {code}
> -- returns one-fifteenth, not one-sixteenth, of the input.
> sampled_lines = FILTER(FOREACH lines GENERATE MD5(val) AS digest, val) BY (STARTSWITH(digest,
'f'));
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message