hawq-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lili Ma <...@pivotal.io>
Subject Re: Options Usage of "hawq register"
Date Mon, 15 Aug 2016 07:57:19 GMT
Thanks Hong for detailed explanation.  A small typo, should be "parquet",
instead of "parquer". :)

Why we do not need specify table name in scenario II is that the .yml file
generated by extract will covers table schema. Hawq register can directly
use that schema to create table refer to this information.

Another thing need to mention here is for the hash-distributed table. We
don't support register for hash-distributed table for case I.  For case II,
we can support it to ensure the extracted yml file can work.

Also I think we need pay special attention to partition table during
implementation, and verify the correctness.

Thanks
Lili

On Mon, Aug 15, 2016 at 3:25 PM, Hong Wu <xunzhangthu@gmail.com> wrote:

> Hi HAWQ developers,
>
> This thread means to confirm the option usage of hawq register.
>
> There will be two scenarios for users to use the hawq register tool so far.
> - I. Register external parquet data into HAWQ. For example, users want to
> migrate parquet tables from HIVE to HAWQ as quick as possible. In this
> case, only parquet format is supported and the original parquet files in
> hive are moved.
>
> - II. User should be able to use hawq register to register table files into
> a new HAWQ cluster. It is some kind of protecting against corruption from
> users' perspective. Users use the last-known-good metadata to update the
> portion of catalog managing HDFS blocks. The table files or dictionary
> should be backuped(such as using distcp) into the same path in the new HDFS
> setting. And in this case, both AO and Parquet formats are supported.
>
> Considering above cases, the designed options for hawq register looks
> below:
>
> hawq register [-h hostname] [-p port] [-U username] [-d database] [-t
> tablename] [-f filepath] [-c config]
> Note that the -h, p, -U options are optional, the -c option and the -t, -f
> options are mutually exclusive which are corresponding to two different
> cases above. Consequently, the expected usage of hawq register should be
> like below:
>
> - Case I
> hadoop fs -put -f hdfs://localhost:8020/hive/original_data.paq
> hdfs://localhost:8020/test_data.paq;
>
> create table t1(i int) with (appendonly = true, orientation=parquer);
>
> hawq register -h localhost -p 5432 -u me -d postgres -t t1 -f
> hdfs://localhost:8020/test_data.paq;
>
> - Case II
> hawq extract -o t1.yml t1;
>
> hawq register -h localhost -p 5432 -u me -d postgres -c t1.yml;
>
> Incorrect usage(in both of these cases, hawq resgiter will print an error
> and then exit):
> hawq register -h localhost -p 5432 -u me -d postgres -c t1.yml -t t1;
> hawq register -h localhost -p 5432 -u me -d postgres -c t1.yml -f
> hdfs://localhost:8020/test_data.paq;
> hawq register -h localhost -p 5432 -u me -d postgres -c t1.yml -t t1 -f
> hdfs://localhost:8020/test_data.paq;
>
> Does this design make sense, any comments? Thanks.
>
> Best
> Hong
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message