accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Carrino <carrino....@gmail.com>
Subject Re: File hash key case observation
Date Tue, 10 Dec 2013 04:26:05 GMT
Hi David, sorry I haven't gotten back to you, the legit emails are getting
lost in the flood of mailing list emails.  I should probably unsubscribe
and just keep up to date by checking the archives.  The MD5 hashes are
available in the NIST set along with the SHA1's, but it doesn't seem to
take up much more CPU cycles to compute the SHA1's, so I've been using
those.


On Thu, Dec 5, 2013 at 11:31 PM, David Medinets <david.medinets@gmail.com>wrote:

> Are you working to ingest a large number of files into Accumulo?
>
>
> On Thu, Dec 5, 2013 at 11:30 PM, David Medinets <david.medinets@gmail.com>wrote:
>
>> After ingesting a few million files using the method in the Accumulo File
>> System Archive (http://accumulo.apache.org/1.4/examples/dirlist.html) we
>> ran into a problem reading the information back out of Accumulo. I forget
>> the error but I resolved it by using DigestUtils.md5hex instead of
>> Digestutils.md5 which stored the md5 as hex string instead of a binary
>> value. We did not dig into what caused the error we just side-stepped it.
>>
>>
>> On Wed, Dec 4, 2013 at 11:37 PM, Chris Carrino <carrino.dev@gmail.com>wrote:
>>
>>> The org.apache.accumulo.examples.simple.filedata.FileDataIngest class
>>> generates LOWERCASE hash keys via the hexString() method, and uses them as
>>> row ID's for storing file chunks in Accumulo.  Note that NIST uses
>>> UPPERCASE hash keys in the Reference Data Set (RDS).  See
>>> http://www.nsrl.nist.gov/ for the RDS.  Both approaches are valid since
>>> the hexadecimal representation of the key is not case sensitive - but make
>>> sure you normalize to one case if you are comparing the keys generated in
>>> the FileDataIngest class to the RDS keys.
>>>
>>
>>
>

Mime
View raw message