Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 63897200C38 for ; Wed, 15 Mar 2017 11:18:17 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 62C1D160B78; Wed, 15 Mar 2017 10:18:17 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 5DD63160B70 for ; Wed, 15 Mar 2017 11:18:16 +0100 (CET) Received: (qmail 81732 invoked by uid 500); 15 Mar 2017 10:18:15 -0000 Mailing-List: contact dev-help@hawq.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hawq.incubator.apache.org Delivered-To: mailing list dev@hawq.incubator.apache.org Received: (qmail 81715 invoked by uid 99); 15 Mar 2017 10:18:15 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 15 Mar 2017 10:18:15 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id B939518495F for ; Wed, 15 Mar 2017 10:18:14 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.499 X-Spam-Level: ** X-Spam-Status: No, score=2.499 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, RCVD_IN_SORBS_SPAM=0.5] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=pivotal-io.20150623.gappssmtp.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id xWeVS9hVqdPo for ; Wed, 15 Mar 2017 10:18:10 +0000 (UTC) Received: from mail-io0-f179.google.com (mail-io0-f179.google.com [209.85.223.179]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 785AF5F238 for ; Wed, 15 Mar 2017 10:18:10 +0000 (UTC) Received: by mail-io0-f179.google.com with SMTP id b140so16907869iof.1 for ; Wed, 15 Mar 2017 03:18:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=pivotal-io.20150623.gappssmtp.com; s=20150623; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=s6suE8Q0R7px4UI/896UnAxLHSO7dfqKa84H43GQd70=; b=ToswllqmgCpz9QUOwG17tg/xvXe8ZmERs/uhcS1bZm6eR6nCgdKAMjL3gPno/WZHc/ MrV3YoA1BvIDJYGpikiwZXFDJ7jF712N/k12qCRALaSRpTQxM+oJGvpbqu3x+phNKrVL d2M5AKiEE81d+GLlGdLwpxl08idz/vIlJG8nFFxVtAP94PUQssXX+BISH6ozpqSuOxzX TrTdvPN/PMxfGXGOBSM8sgojkrP2FHJUuXFLJW0k858JjtJWEKeaCY+4W3FpJ7H/2kV3 NLAGJeY+TuIXFNLTvfT0OPw/jA80W9+095FbRr+LWM0aAY0MpS79R19AqUnKq/wMvz9G zccw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=s6suE8Q0R7px4UI/896UnAxLHSO7dfqKa84H43GQd70=; b=pYksdne2VIxX+DqjoyXxKgPhTIfC+4xYjS5753inOeEWuDIwfWBIE/u0vhpvbCAS2V tJ41k3aJSYKpQWinFrjNsunRA6kwGmtQ9QMdeTmmQe7XX2o1RHI/0KHIYlP1UiCr3TMb +Pxrl6m8zf+vWZaI2A/33kxz7+/c9jUZnpqEa0ZdeG9diOn3dib2n4uHTdjeB9jmGF8E 9O1R65OP0V4m9vSkEHRh2ljb2YtaE7S+OAXMO66ElPPNJ3hSO1CN0QHTRwSi6B1kWHkc UZFfs2spYX7IVCJeHY7lJyCXWVFAZOHd4CHrZSARgHSk68ZkaGDrTBGajNZch6xz2bCy CwiA== X-Gm-Message-State: AFeK/H0Qt9uDei85KsD+MvUcteYV5e4DVylz1RsJK5zJ50yZGqPWnPLMNqpdvOHW58/rhe3hvTZ6bMyCvchQAVIB X-Received: by 10.107.58.68 with SMTP id h65mr4052426ioa.221.1489573084012; Wed, 15 Mar 2017 03:18:04 -0700 (PDT) MIME-Version: 1.0 Received: by 10.107.41.147 with HTTP; Wed, 15 Mar 2017 03:18:03 -0700 (PDT) In-Reply-To: References: From: Ming Li Date: Wed, 15 Mar 2017 18:18:03 +0800 Message-ID: Subject: Re: Questions about filesystem / filespace / tablespace To: dev@hawq.incubator.apache.org Content-Type: multipart/alternative; boundary=001a114abc608a4051054ac24252 archived-at: Wed, 15 Mar 2017 10:18:17 -0000 --001a114abc608a4051054ac24252 Content-Type: text/plain; charset=UTF-8 Hi Kyle, If we keep all these filesystem similar to hdfs, only support append only, then then change must be much less. I think we can go ahead to implement a demo for it if we have resource, we may encounter problems, but we can find more solution/workaround for it. -------------------- For your question about the relationship between 3 source code files, below is my understanding (because the code is not written by me, my opinion maybe not completely correct.) (1) bin/gpfilesystem/hdfs/gpfshdfs.c -- implement all API used in hdfs tuple in the catalog pg_filesystem, it will directly call API in libhdfs3 to access hdfs file system. The reason why make it a wrapper is to define all these API as UDF, so that we can easily support similar filesystem by adding a similar tuple in pg_filesystem, and add similar code as this file, without changing any place calling these API. Also because they are UDF, we can upgrade the old binary hawq to add new file system. (2) backend/storage/file/filesystem.c -- because all API in (1) is in form of UDF, so we need a conversion if we want to directly call these API. This file is responsible for converting normal hdfs calling in hawq kernel to UDF calling. (3) backend/storage/file/fd.c -- Because OS have file description open number limitation, PostgreSQL/HAWQ will use a LUR buffer to cache all opened file handlers. All hdfs API in this file also manage file handler same as native file systems. These functions call API in (2) to interact with hdfs. In a word, the calling stack is: (3) --> (2) --> (1) --> libhdfs3 API. ------------------- The last question about tablespace, PostgreSQL introduce it so that user can set different tablespace to different paths, and these paths can be mounted with different file system on linux. But all filesystems API are the same, and the functionality are the same (supporting UPDATE in place). So we cannot directly use tablespace to hand this scenario. And also I cannot guess how much effort needed because I did participate the hdfs file system supporting in the hawq origin release. That's my opinion, any correction or suggestion are welcomed! Hope it can help you! Thanks. On Wed, Mar 15, 2017 at 11:07 AM, Paul Guo wrote: > Hi Kyle, > > I'm not sure whether I understand your point correctly, but for FUSE which > allows userspace file system implementation on Linux, users uses the > filesystem (e.g. S3 in your example) as a block storage, accesses it via > standard sys calls like open, close, read, write although some behaviours > or sys call could probably be not supported. That means for query for FUSE > fs, you are probably able to access them using the interfaces in fd.c > directly (I'm not sure some hacking is needed), but for such kind of > distributed file systems, compared with fuse access way, lib access is > usually more encouraged since: 1) performance (You could search for the > fuse theory to see the long fuse call paths which are added for file > access) 2) stability (You add the fuse kernel part in your software stack > and according to my experience it will be really painful to handle some > exceptions). For such storage, I'd really prefer some other solutions, lib > access like hawq or external table, whatever. > > Actually long long time ago I've seen fuse over hdfs on real production > environment, so I'm actually curious whether someone have tried query it > via this solution before and compared with the hawq for the performance, > etc. > > > > > 2017-03-15 1:26 GMT+08:00 Kyle Dunn : > > > Ming - > > > > Great points about append-only. One potential work-around is to split a > > table over multiple backend storage objects, (a new file for each append > > operation), Then, maybe as part of VACUUM, perform object compaction. For > > GCP, the server-side compaction capability for objects is called compose > > . For > AWS, > > you can emulate this behavior using Multipart upload > > - > > demonstrated concretely with the Ruby SDK here > > > object-concatenation-using-the-aws-sdk-for-ruby/>. > > Azure actually supports append-blobs > > > 04/13/introducing-azure-storage-append-blob/> > > natively. > > > > For the FUSE exploration, can you (or anyone else) help me understand the > > relationship and/or call graph between these different implementations? > > > > - backend/storage/file/filesystem.c > > - bin/gpfilesystem/hdfs/gpfshdfs.c > > - backend/storage/file/fd.c > > > > I feel confident that everything HDFS-related ultimately uses > > libhdfs3/src/client/Hdfs.cpp but it seems like a convoluted path for > > getting there from the backend code. > > > > Also, it looks like normal Postgres allows tablespaces to be created like > > this: > > > > CREATE TABLESPACE fastspace LOCATION '/mnt/sda1/postgresql/data'; > > > > This is much simpler than wrapping glibc calls and is exactly what would > be > > necessary if using FUSE modules + mount points to handle a "pluggable" > > backend. Maybe you (or someone) can advise how much effort it would be to > > bring "local:// FS" tablespace support back? It is potentially less than > > trying to unravel all the HDFS-specific implementation scattered around > the > > backend code. > > > > > > Thanks, > > Kyle > > > > On Mon, Mar 13, 2017 at 8:35 PM Ming Li wrote: > > > > > Hi Kyle, > > > > > > Good investigation! > > > > > > I think we can add a similar tuple as hdfs in the pg_filesystem at > first, > > > then implement all API introduce in this tuple to call the FUSE API. > > > > > > However because HAWQ are designed for hdfs which means only append-only > > > file system, so when we support other types of filesystem, we should > > > investigate how to improve the performance and transaction issues. The > > > performance can be investigate after we implement a demo, but the > > > transaction issue should be decided before. Append only file system > don't > > > support UPDATE in place, and the inserted data are traced by file > length > > in > > > pg_aoseg.pg_aoseg_xxxxx or pg_parquet.pg_parquet_xxxxx. > > > > > > Thanks. > > > > > > > > > > > > > > > > > > On Tue, Mar 14, 2017 at 7:57 AM, Kyle Dunn wrote: > > > > > > > Hello devs - > > > > > > > > I'm doing some reading about HAWQ tablespaces here: > > > > http://hdb.docs.pivotal.io/212/hawq/ddl/ddl-tablespace.html > > > > > > > > I want to understand the flow of things, please correct me on the > > > following > > > > assumptions: > > > > > > > > 1) Create a filesystem (not *really* supported after HAWQ init) - the > > > > default is obviously [lib]HDFS[3]: > > > > SELECT * FROM pg_filesystem; > > > > > > > > 2) Create a filespace, referencing the above file system: > > > > CREATE FILESPACE testfs ON hdfs > > > > ('localhost:8020/fs/testfs') WITH (NUMREPLICA = 1); > > > > > > > > 3) Create a tablespace, reference the above filespace: > > > > CREATE TABLESPACE fastspace FILESPACE testfs; > > > > > > > > 4) Create objects referencing the above table space, or set it as the > > > > database's default: > > > > CREATE DATABASE testdb WITH TABLESPACE=testfs; > > > > > > > > Given this set of steps, it it true (*in theory*) an arbitrary > > filesystem > > > > (i.e. storage backend) could be added to HAWQ using *existing* APIs? > > > > > > > > I realize the nuances of this are significant, but conceptually I'd > > like > > > to > > > > gather some details, mainly in support of this > > > > ongoing JIRA > > > discussion. > > > > I'm daydreaming about whether this neat tool: > > > > https://github.com/s3fs-fuse/s3fs-fuse could be useful for an S3 > spike > > > > (which also seems to kind of work on Google Cloud, when > > interoperability > > > > < > > > https://github.com/s3fs-fuse/s3fs-fuse/issues/109# > issuecomment-286222694 > > > > > > > mode is enabled). By it's Linux FUSE nature, it implements the lion's > > > share > > > > of required pg_filesystem functions; in fact, maybe we could actually > > use > > > > system calls from glibc (somewhat >) > > > > directly in this situation. > > > > > > > > Curious to get some feedback. > > > > > > > > > > > > Thanks, > > > > Kyle > > > > -- > > > > *Kyle Dunn | Data Engineering | Pivotal* > > > > Direct: 303.905.3171 <(303)%20905-3171> <3039053171 > > <(303)%20905-3171>> > > > | Email: kdunn@pivotal.io > > > > > > > > > -- > > *Kyle Dunn | Data Engineering | Pivotal* > > Direct: 303.905.3171 <3039053171> | Email: kdunn@pivotal.io > > > --001a114abc608a4051054ac24252--