accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <josh.el...@gmail.com>
Subject Re: Accumulo on s3
Date Mon, 25 Apr 2016 18:12:03 GMT
Yeah, ec2's EBS and ephemeral storage are fine AFAIK. I just don't know 
much anything at all about S3 (which might be why I'm inherently so 
pessimistic about it working :P).

Dylan Hutchison wrote:
> Hey Josh,
>
> Are there other platforms on AWS (or another cloud provider) that
> Accumulo/HDFS are friendly to run on?  I thought I remembered you and
> others running the agitation tests on Amazon instances during
> release-testing time.  If there are alternatives, what advantages would S3
> have over the current method?
>
> On Mon, Apr 25, 2016 at 8:09 AM, Josh Elser<josh.elser@gmail.com>  wrote:
>
>> I'm not sure on the guarantees of s3 (much less the s3 or s3a Hadoop
>> FileSystem implementations), but, historically, the common issue is
>> lacking/incorrect implementations of sync(). For durability (read-as: not
>> losing your data), Accumulo *must* know that when it calls sync() on a
>> file, the data is persisted.
>>
>> I don't know definitively what S3 guarantees (or asserts to guarantee),
>> but I would be very afraid until I ran some testing (we have one good test
>> in Accumulo that can run for days and verify data integrity called
>> continuous ingest).
>>
>> You might have luck reaching out to the Hadoop community to get some
>> understanding from them about what can reasonably be expected with the
>> current S3 FileSystem implementations, and then run your own tests to make
>> sure that data is not lost.
>>
>>
>> vdelmeglio wrote:
>>
>>> Hi everyone,
>>>
>>> I recently got this answer on stackoverflow (link:
>>>
>>> http://stackoverflow.com/questions/36602719/accumulo-cluster-in-aws-with-s3-not-really-stable/36772874#36772874
>>> ):
>>>
>>>
>>>    Yes, I would expect that running Accumulo with S3 would result in
>>>> problems. Even though S3 has a FileSystem implementation, it does not
>>>> behave like a normal file system. Some examples of the differences are
>>>> that operations we would expect to be atomic are not atomic in S3,
>>>> exceptions may mean different things than we expect, and we assume our
>>>> view of files and their metadata is consistent rather than the eventual
>>>> consistency S3 provides.
>>>>
>>>> It's possible these issues could be mitigated if we made some
>>>> modifications to the Accumulo code, but as far as I know no one has tried
>>>> running Accumulo on S3 to figure out the problems and whether those could
>>>> be fixed or not.
>>>>
>>> Since we're currently running an accumulo cluster on aws with s3 for
>>> evaluation purpose, this answer make me wonder, should someone explain me
>>> why running accumulo on s3 is not a good idea? in the specific, which
>>> operations are expected to be atomic on accumulo?
>>>
>>> Is there eventually a roadmap for s3 compatibility?
>>>
>>> Thanks!
>>> Valerio
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://apache-accumulo.1065345.n5.nabble.com/Accumulo-on-s3-tp16737.html
>>> Sent from the Developers mailing list archive at Nabble.com.
>>>
>

Mime
View raw message