accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <josh.el...@gmail.com>
Subject Re: Accumulo on Azure / WebHDFS
Date Sat, 15 Apr 2017 18:25:15 GMT
As I understand it, S3 is currently still a non-starter.

Long term, Amazon may provide some more features to fix the sync issue. Or,
someone can modify Accumulo to support putting rfiles on s3 exclusively.

Happy to expand on this further if you're curious.


On Apr 14, 2017 15:16, "James Hughes" <jnh5y@virginia.edu> wrote:

Hi Josh,

Thanks!  Sounds like Azure's offerings are providing better performance and
sync()'ing over S3?  (I.e., is S3 still a no-go for Accumulo?)

Your description of WebHDFS makes totally sense.  I figured there may be an
outside chance that WebHDFS handled or worked around limitations from S3,
etc.

Cheers,

Jim

On Fri, Apr 14, 2017 at 12:47 PM, Josh Elser <josh.elser@gmail.com> wrote:

> Hi Jim,
>
> I can say that Accumulo will work on Azure's blob store and their data
> lake store. These are a result of testing I'm involved with at
> Hortonworks (dayjob). I know that these filesystems are tested to an
> appropriate degree, proving that they do provide the things that
> Accumulo needs.
>
> As a refresher, the things we need from a filesystem are: performance
> (Accumulo's write performance is pretty dominated by I/O) and
> durability guarantees (when we call sync() on a file, the data we just
> wrote better be there).
>
> For WebHDFS, I think you would both hurt for performance and I would
> be surprised if it actually provided the durability correctness. My
> understanding is that WebHDFS is more meant to allow non-Java clients
> easy access to HDFS (as a one-off) rather than act as a fully-fledged
> access layer.
>
> - Josh
>
> On Fri, Apr 14, 2017 at 10:16 AM, James Hughes <jnh5y@virginia.edu> wrote:
> > Hi all,
> >
> > I know folks have asked about Accumulo on S3 before (1).
> >
> > Has anyone tried running Accumulo on Azure's blob storage or data lake
> > solutions (2)?  (Or perhaps more generally, has anyone tried Accumulo on
> > WebHDFS?)
> >
> > As more background, I have deployed Accumulo on HDP clouds in Azure, and
> > that works great.  I'm interested in using the blob / data lake storage
> for
> > benefits with scaling, etc.
> >
> > Thanks in advance,
> >
> > Jim
> >
> > 1.  http://apache-accumulo.1065345.n5.nabble.com/Accumulo-on-s3-
> td16737.html
> > 2.
> > https://docs.microsoft.com/en-us/azure/data-lake-store/data-
> lake-store-integrate-with-other-services
>

Mime
View raw message