accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <josh.el...@gmail.com>
Subject Re: Accumulo on s3
Date Mon, 25 Apr 2016 15:09:40 GMT
I'm not sure on the guarantees of s3 (much less the s3 or s3a Hadoop 
FileSystem implementations), but, historically, the common issue is 
lacking/incorrect implementations of sync(). For durability (read-as: 
not losing your data), Accumulo *must* know that when it calls sync() on 
a file, the data is persisted.

I don't know definitively what S3 guarantees (or asserts to guarantee), 
but I would be very afraid until I ran some testing (we have one good 
test in Accumulo that can run for days and verify data integrity called 
continuous ingest).

You might have luck reaching out to the Hadoop community to get some 
understanding from them about what can reasonably be expected with the 
current S3 FileSystem implementations, and then run your own tests to 
make sure that data is not lost.

vdelmeglio wrote:
> Hi everyone,
>
> I recently got this answer on stackoverflow (link:
> http://stackoverflow.com/questions/36602719/accumulo-cluster-in-aws-with-s3-not-really-stable/36772874#36772874):
>
>
>>   Yes, I would expect that running Accumulo with S3 would result in
>> problems. Even though S3 has a FileSystem implementation, it does not
>> behave like a normal file system. Some examples of the differences are
>> that operations we would expect to be atomic are not atomic in S3,
>> exceptions may mean different things than we expect, and we assume our
>> view of files and their metadata is consistent rather than the eventual
>> consistency S3 provides.
>>
>> It's possible these issues could be mitigated if we made some
>> modifications to the Accumulo code, but as far as I know no one has tried
>> running Accumulo on S3 to figure out the problems and whether those could
>> be fixed or not.
>
> Since we're currently running an accumulo cluster on aws with s3 for
> evaluation purpose, this answer make me wonder, should someone explain me
> why running accumulo on s3 is not a good idea? in the specific, which
> operations are expected to be atomic on accumulo?
>
> Is there eventually a roadmap for s3 compatibility?
>
> Thanks!
> Valerio
>
>
>
> --
> View this message in context: http://apache-accumulo.1065345.n5.nabble.com/Accumulo-on-s3-tp16737.html
> Sent from the Developers mailing list archive at Nabble.com.

Mime
View raw message