accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <>
Subject Re: Accumulo on s3
Date Mon, 25 Apr 2016 18:12:03 GMT
Yeah, ec2's EBS and ephemeral storage are fine AFAIK. I just don't know 
much anything at all about S3 (which might be why I'm inherently so 
pessimistic about it working :P).

Dylan Hutchison wrote:
> Hey Josh,
> Are there other platforms on AWS (or another cloud provider) that
> Accumulo/HDFS are friendly to run on?  I thought I remembered you and
> others running the agitation tests on Amazon instances during
> release-testing time.  If there are alternatives, what advantages would S3
> have over the current method?
> On Mon, Apr 25, 2016 at 8:09 AM, Josh Elser<>  wrote:
>> I'm not sure on the guarantees of s3 (much less the s3 or s3a Hadoop
>> FileSystem implementations), but, historically, the common issue is
>> lacking/incorrect implementations of sync(). For durability (read-as: not
>> losing your data), Accumulo *must* know that when it calls sync() on a
>> file, the data is persisted.
>> I don't know definitively what S3 guarantees (or asserts to guarantee),
>> but I would be very afraid until I ran some testing (we have one good test
>> in Accumulo that can run for days and verify data integrity called
>> continuous ingest).
>> You might have luck reaching out to the Hadoop community to get some
>> understanding from them about what can reasonably be expected with the
>> current S3 FileSystem implementations, and then run your own tests to make
>> sure that data is not lost.
>> vdelmeglio wrote:
>>> Hi everyone,
>>> I recently got this answer on stackoverflow (link:
>>> ):
>>>    Yes, I would expect that running Accumulo with S3 would result in
>>>> problems. Even though S3 has a FileSystem implementation, it does not
>>>> behave like a normal file system. Some examples of the differences are
>>>> that operations we would expect to be atomic are not atomic in S3,
>>>> exceptions may mean different things than we expect, and we assume our
>>>> view of files and their metadata is consistent rather than the eventual
>>>> consistency S3 provides.
>>>> It's possible these issues could be mitigated if we made some
>>>> modifications to the Accumulo code, but as far as I know no one has tried
>>>> running Accumulo on S3 to figure out the problems and whether those could
>>>> be fixed or not.
>>> Since we're currently running an accumulo cluster on aws with s3 for
>>> evaluation purpose, this answer make me wonder, should someone explain me
>>> why running accumulo on s3 is not a good idea? in the specific, which
>>> operations are expected to be atomic on accumulo?
>>> Is there eventually a roadmap for s3 compatibility?
>>> Thanks!
>>> Valerio
>>> --
>>> View this message in context:
>>> Sent from the Developers mailing list archive at

View raw message