hawq-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ming Li <...@pivotal.io>
Subject Re: Questions about filesystem / filespace / tablespace
Date Tue, 14 Mar 2017 02:35:09 GMT
Hi Kyle,

Good investigation!

I think we can add a similar tuple as hdfs in the pg_filesystem at first,
then implement all API introduce in this tuple to call the FUSE API.

However because HAWQ are designed for hdfs which means only append-only
file system, so when we support other types of filesystem, we should
investigate how to improve the performance and transaction issues. The
performance can be investigate after we implement a demo, but the
transaction issue should be decided before. Append only file system don't
support UPDATE in place, and the inserted data are traced by file length in
pg_aoseg.pg_aoseg_xxxxx or pg_parquet.pg_parquet_xxxxx.

Thanks.





On Tue, Mar 14, 2017 at 7:57 AM, Kyle Dunn <kdunn@pivotal.io> wrote:

> Hello devs -
>
> I'm doing some reading about HAWQ tablespaces here:
> http://hdb.docs.pivotal.io/212/hawq/ddl/ddl-tablespace.html
>
> I want to understand the flow of things, please correct me on the following
> assumptions:
>
> 1) Create a filesystem (not *really* supported after HAWQ init) - the
> default is obviously [lib]HDFS[3]:
>       SELECT * FROM pg_filesystem;
>
> 2) Create a filespace, referencing the above file system:
>       CREATE FILESPACE testfs ON hdfs
>       ('localhost:8020/fs/testfs') WITH (NUMREPLICA = 1);
>
> 3) Create a tablespace, reference the above filespace:
>       CREATE TABLESPACE fastspace FILESPACE testfs;
>
> 4) Create objects referencing the above table space, or set it as the
> database's default:
>       CREATE DATABASE testdb WITH TABLESPACE=testfs;
>
> Given this set of steps, it it true (*in theory*) an arbitrary filesystem
> (i.e. storage backend) could be added to HAWQ using *existing* APIs?
>
> I realize the nuances of this are significant, but conceptually I'd like to
> gather some details, mainly in support of this
> <https://issues.apache.org/jira/browse/HAWQ-1270> ongoing JIRA discussion.
> I'm daydreaming about whether this neat tool:
> https://github.com/s3fs-fuse/s3fs-fuse could be useful for an S3 spike
> (which also seems to kind of work on Google Cloud, when interoperability
> <https://github.com/s3fs-fuse/s3fs-fuse/issues/109#issuecomment-286222694>
> mode is enabled). By it's Linux FUSE nature, it implements the lion's share
> of required pg_filesystem functions; in fact, maybe we could actually use
> system calls from glibc (somewhat <http://www.linux-mag.com/id/7814/>)
> directly in this situation.
>
> Curious to get some feedback.
>
>
> Thanks,
> Kyle
> --
> *Kyle Dunn | Data Engineering | Pivotal*
> Direct: 303.905.3171 <3039053171> | Email: kdunn@pivotal.io
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message