accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Medinets <david.medin...@gmail.com>
Subject Re: File hash key case observation
Date Fri, 06 Dec 2013 04:31:33 GMT
Are you working to ingest a large number of files into Accumulo?


On Thu, Dec 5, 2013 at 11:30 PM, David Medinets <david.medinets@gmail.com>wrote:

> After ingesting a few million files using the method in the Accumulo File
> System Archive (http://accumulo.apache.org/1.4/examples/dirlist.html) we
> ran into a problem reading the information back out of Accumulo. I forget
> the error but I resolved it by using DigestUtils.md5hex instead of
> Digestutils.md5 which stored the md5 as hex string instead of a binary
> value. We did not dig into what caused the error we just side-stepped it.
>
>
> On Wed, Dec 4, 2013 at 11:37 PM, Chris Carrino <carrino.dev@gmail.com>wrote:
>
>> The org.apache.accumulo.examples.simple.filedata.FileDataIngest class
>> generates LOWERCASE hash keys via the hexString() method, and uses them as
>> row ID's for storing file chunks in Accumulo.  Note that NIST uses
>> UPPERCASE hash keys in the Reference Data Set (RDS).  See
>> http://www.nsrl.nist.gov/ for the RDS.  Both approaches are valid since
>> the hexadecimal representation of the key is not case sensitive - but make
>> sure you normalize to one case if you are comparing the keys generated in
>> the FileDataIngest class to the RDS keys.
>>
>
>

Mime
View raw message