hawq-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kyle Dunn <kd...@pivotal.io>
Subject Re: Questions about filesystem / filespace / tablespace
Date Tue, 14 Mar 2017 17:26:12 GMT
Ming -

Great points about append-only. One potential work-around is to split a
table over multiple backend storage objects, (a new file for each append
operation), Then, maybe as part of VACUUM, perform object compaction. For
GCP, the server-side compaction capability for objects is called compose
<https://cloud.google.com/storage/docs/gsutil/commands/compose>. For AWS,
you can emulate this behavior using Multipart upload
<http://docs.aws.amazon.com/AmazonS3/latest/API/mpUploadInitiate.html> -
demonstrated concretely with the Ruby SDK here
<https://aws.amazon.com/blogs/developer/efficient-amazon-s3-object-concatenation-using-the-aws-sdk-for-ruby/>.
Azure actually supports append-blobs
<https://blogs.msdn.microsoft.com/windowsazurestorage/2015/04/13/introducing-azure-storage-append-blob/>
 natively.

For the FUSE exploration, can you (or anyone else) help me understand the
relationship and/or call graph between these different implementations?

   - backend/storage/file/filesystem.c
   - bin/gpfilesystem/hdfs/gpfshdfs.c
   - backend/storage/file/fd.c

I feel confident that everything HDFS-related ultimately uses
libhdfs3/src/client/Hdfs.cpp but it seems like a convoluted path for
getting there from the backend code.

Also, it looks like normal Postgres allows tablespaces to be created like
this:

      CREATE TABLESPACE fastspace LOCATION '/mnt/sda1/postgresql/data';

This is much simpler than wrapping glibc calls and is exactly what would be
necessary if using FUSE modules + mount points to handle a "pluggable"
backend. Maybe you (or someone) can advise how much effort it would be to
bring "local:// FS" tablespace support back? It is potentially less than
trying to unravel all the HDFS-specific implementation scattered around the
backend code.


Thanks,
Kyle

On Mon, Mar 13, 2017 at 8:35 PM Ming Li <mli@pivotal.io> wrote:

> Hi Kyle,
>
> Good investigation!
>
> I think we can add a similar tuple as hdfs in the pg_filesystem at first,
> then implement all API introduce in this tuple to call the FUSE API.
>
> However because HAWQ are designed for hdfs which means only append-only
> file system, so when we support other types of filesystem, we should
> investigate how to improve the performance and transaction issues. The
> performance can be investigate after we implement a demo, but the
> transaction issue should be decided before. Append only file system don't
> support UPDATE in place, and the inserted data are traced by file length in
> pg_aoseg.pg_aoseg_xxxxx or pg_parquet.pg_parquet_xxxxx.
>
> Thanks.
>
>
>
>
>
> On Tue, Mar 14, 2017 at 7:57 AM, Kyle Dunn <kdunn@pivotal.io> wrote:
>
> > Hello devs -
> >
> > I'm doing some reading about HAWQ tablespaces here:
> > http://hdb.docs.pivotal.io/212/hawq/ddl/ddl-tablespace.html
> >
> > I want to understand the flow of things, please correct me on the
> following
> > assumptions:
> >
> > 1) Create a filesystem (not *really* supported after HAWQ init) - the
> > default is obviously [lib]HDFS[3]:
> >       SELECT * FROM pg_filesystem;
> >
> > 2) Create a filespace, referencing the above file system:
> >       CREATE FILESPACE testfs ON hdfs
> >       ('localhost:8020/fs/testfs') WITH (NUMREPLICA = 1);
> >
> > 3) Create a tablespace, reference the above filespace:
> >       CREATE TABLESPACE fastspace FILESPACE testfs;
> >
> > 4) Create objects referencing the above table space, or set it as the
> > database's default:
> >       CREATE DATABASE testdb WITH TABLESPACE=testfs;
> >
> > Given this set of steps, it it true (*in theory*) an arbitrary filesystem
> > (i.e. storage backend) could be added to HAWQ using *existing* APIs?
> >
> > I realize the nuances of this are significant, but conceptually I'd like
> to
> > gather some details, mainly in support of this
> > <https://issues.apache.org/jira/browse/HAWQ-1270> ongoing JIRA
> discussion.
> > I'm daydreaming about whether this neat tool:
> > https://github.com/s3fs-fuse/s3fs-fuse could be useful for an S3 spike
> > (which also seems to kind of work on Google Cloud, when interoperability
> > <
> https://github.com/s3fs-fuse/s3fs-fuse/issues/109#issuecomment-286222694>
> > mode is enabled). By it's Linux FUSE nature, it implements the lion's
> share
> > of required pg_filesystem functions; in fact, maybe we could actually use
> > system calls from glibc (somewhat <http://www.linux-mag.com/id/7814/>)
> > directly in this situation.
> >
> > Curious to get some feedback.
> >
> >
> > Thanks,
> > Kyle
> > --
> > *Kyle Dunn | Data Engineering | Pivotal*
> > Direct: 303.905.3171 <(303)%20905-3171> <3039053171 <(303)%20905-3171>>
> | Email: kdunn@pivotal.io
> >
>
-- 
*Kyle Dunn | Data Engineering | Pivotal*
Direct: 303.905.3171 <3039053171> | Email: kdunn@pivotal.io

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message