accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dylan Hutchison <>
Subject Re: Accumulo on s3
Date Mon, 25 Apr 2016 17:13:25 GMT
Hey Josh,

Are there other platforms on AWS (or another cloud provider) that
Accumulo/HDFS are friendly to run on?  I thought I remembered you and
others running the agitation tests on Amazon instances during
release-testing time.  If there are alternatives, what advantages would S3
have over the current method?

On Mon, Apr 25, 2016 at 8:09 AM, Josh Elser <> wrote:

> I'm not sure on the guarantees of s3 (much less the s3 or s3a Hadoop
> FileSystem implementations), but, historically, the common issue is
> lacking/incorrect implementations of sync(). For durability (read-as: not
> losing your data), Accumulo *must* know that when it calls sync() on a
> file, the data is persisted.
> I don't know definitively what S3 guarantees (or asserts to guarantee),
> but I would be very afraid until I ran some testing (we have one good test
> in Accumulo that can run for days and verify data integrity called
> continuous ingest).
> You might have luck reaching out to the Hadoop community to get some
> understanding from them about what can reasonably be expected with the
> current S3 FileSystem implementations, and then run your own tests to make
> sure that data is not lost.
> vdelmeglio wrote:
>> Hi everyone,
>> I recently got this answer on stackoverflow (link:
>> ):
>>   Yes, I would expect that running Accumulo with S3 would result in
>>> problems. Even though S3 has a FileSystem implementation, it does not
>>> behave like a normal file system. Some examples of the differences are
>>> that operations we would expect to be atomic are not atomic in S3,
>>> exceptions may mean different things than we expect, and we assume our
>>> view of files and their metadata is consistent rather than the eventual
>>> consistency S3 provides.
>>> It's possible these issues could be mitigated if we made some
>>> modifications to the Accumulo code, but as far as I know no one has tried
>>> running Accumulo on S3 to figure out the problems and whether those could
>>> be fixed or not.
>> Since we're currently running an accumulo cluster on aws with s3 for
>> evaluation purpose, this answer make me wonder, should someone explain me
>> why running accumulo on s3 is not a good idea? in the specific, which
>> operations are expected to be atomic on accumulo?
>> Is there eventually a roadmap for s3 compatibility?
>> Thanks!
>> Valerio
>> --
>> View this message in context:
>> Sent from the Developers mailing list archive at

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message