hawq-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Guo <paul...@gmail.com>
Subject Re: Questions about filesystem / filespace / tablespace
Date Wed, 15 Mar 2017 03:07:57 GMT
Hi Kyle,

I'm not sure whether I understand your point correctly, but for FUSE which
allows userspace file system implementation on Linux, users uses the
filesystem (e.g. S3 in your example) as a block storage, accesses it via
standard sys calls like open, close, read, write although some behaviours
or sys call could probably be not supported. That means for query for FUSE
fs, you are probably able to access them using the interfaces in fd.c
directly (I'm not sure some hacking is needed), but for such kind of
distributed file systems, compared with fuse access way, lib access is
usually more encouraged since: 1) performance (You could search for the
fuse theory to see the long fuse call paths which are added for file
access) 2) stability (You add the fuse kernel part in your software stack
and according to my experience it will be really painful to handle some
exceptions). For such storage, I'd really prefer some other solutions, lib
access like hawq or external table, whatever.

Actually long long time ago I've seen fuse over hdfs on real production
environment, so I'm actually curious whether someone have tried query it
via this solution before and compared with the hawq for the performance,
etc.




2017-03-15 1:26 GMT+08:00 Kyle Dunn <kdunn@pivotal.io>:

> Ming -
>
> Great points about append-only. One potential work-around is to split a
> table over multiple backend storage objects, (a new file for each append
> operation), Then, maybe as part of VACUUM, perform object compaction. For
> GCP, the server-side compaction capability for objects is called compose
> <https://cloud.google.com/storage/docs/gsutil/commands/compose>. For AWS,
> you can emulate this behavior using Multipart upload
> <http://docs.aws.amazon.com/AmazonS3/latest/API/mpUploadInitiate.html> -
> demonstrated concretely with the Ruby SDK here
> <https://aws.amazon.com/blogs/developer/efficient-amazon-s3-
> object-concatenation-using-the-aws-sdk-for-ruby/>.
> Azure actually supports append-blobs
> <https://blogs.msdn.microsoft.com/windowsazurestorage/2015/
> 04/13/introducing-azure-storage-append-blob/>
>  natively.
>
> For the FUSE exploration, can you (or anyone else) help me understand the
> relationship and/or call graph between these different implementations?
>
>    - backend/storage/file/filesystem.c
>    - bin/gpfilesystem/hdfs/gpfshdfs.c
>    - backend/storage/file/fd.c
>
> I feel confident that everything HDFS-related ultimately uses
> libhdfs3/src/client/Hdfs.cpp but it seems like a convoluted path for
> getting there from the backend code.
>
> Also, it looks like normal Postgres allows tablespaces to be created like
> this:
>
>       CREATE TABLESPACE fastspace LOCATION '/mnt/sda1/postgresql/data';
>
> This is much simpler than wrapping glibc calls and is exactly what would be
> necessary if using FUSE modules + mount points to handle a "pluggable"
> backend. Maybe you (or someone) can advise how much effort it would be to
> bring "local:// FS" tablespace support back? It is potentially less than
> trying to unravel all the HDFS-specific implementation scattered around the
> backend code.
>
>
> Thanks,
> Kyle
>
> On Mon, Mar 13, 2017 at 8:35 PM Ming Li <mli@pivotal.io> wrote:
>
> > Hi Kyle,
> >
> > Good investigation!
> >
> > I think we can add a similar tuple as hdfs in the pg_filesystem at first,
> > then implement all API introduce in this tuple to call the FUSE API.
> >
> > However because HAWQ are designed for hdfs which means only append-only
> > file system, so when we support other types of filesystem, we should
> > investigate how to improve the performance and transaction issues. The
> > performance can be investigate after we implement a demo, but the
> > transaction issue should be decided before. Append only file system don't
> > support UPDATE in place, and the inserted data are traced by file length
> in
> > pg_aoseg.pg_aoseg_xxxxx or pg_parquet.pg_parquet_xxxxx.
> >
> > Thanks.
> >
> >
> >
> >
> >
> > On Tue, Mar 14, 2017 at 7:57 AM, Kyle Dunn <kdunn@pivotal.io> wrote:
> >
> > > Hello devs -
> > >
> > > I'm doing some reading about HAWQ tablespaces here:
> > > http://hdb.docs.pivotal.io/212/hawq/ddl/ddl-tablespace.html
> > >
> > > I want to understand the flow of things, please correct me on the
> > following
> > > assumptions:
> > >
> > > 1) Create a filesystem (not *really* supported after HAWQ init) - the
> > > default is obviously [lib]HDFS[3]:
> > >       SELECT * FROM pg_filesystem;
> > >
> > > 2) Create a filespace, referencing the above file system:
> > >       CREATE FILESPACE testfs ON hdfs
> > >       ('localhost:8020/fs/testfs') WITH (NUMREPLICA = 1);
> > >
> > > 3) Create a tablespace, reference the above filespace:
> > >       CREATE TABLESPACE fastspace FILESPACE testfs;
> > >
> > > 4) Create objects referencing the above table space, or set it as the
> > > database's default:
> > >       CREATE DATABASE testdb WITH TABLESPACE=testfs;
> > >
> > > Given this set of steps, it it true (*in theory*) an arbitrary
> filesystem
> > > (i.e. storage backend) could be added to HAWQ using *existing* APIs?
> > >
> > > I realize the nuances of this are significant, but conceptually I'd
> like
> > to
> > > gather some details, mainly in support of this
> > > <https://issues.apache.org/jira/browse/HAWQ-1270> ongoing JIRA
> > discussion.
> > > I'm daydreaming about whether this neat tool:
> > > https://github.com/s3fs-fuse/s3fs-fuse could be useful for an S3 spike
> > > (which also seems to kind of work on Google Cloud, when
> interoperability
> > > <
> > https://github.com/s3fs-fuse/s3fs-fuse/issues/109#issuecomment-286222694
> >
> > > mode is enabled). By it's Linux FUSE nature, it implements the lion's
> > share
> > > of required pg_filesystem functions; in fact, maybe we could actually
> use
> > > system calls from glibc (somewhat <http://www.linux-mag.com/id/7814/>)
> > > directly in this situation.
> > >
> > > Curious to get some feedback.
> > >
> > >
> > > Thanks,
> > > Kyle
> > > --
> > > *Kyle Dunn | Data Engineering | Pivotal*
> > > Direct: 303.905.3171 <(303)%20905-3171> <3039053171
> <(303)%20905-3171>>
> > | Email: kdunn@pivotal.io
> > >
> >
> --
> *Kyle Dunn | Data Engineering | Pivotal*
> Direct: 303.905.3171 <3039053171> | Email: kdunn@pivotal.io
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message