hawq-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lili Ma <...@pivotal.io>
Subject Re: Options Usage of "hawq register"
Date Tue, 16 Aug 2016 03:34:53 GMT
@Lei, Since current hawq register supports two cases: Specifying tablename
& filepath, or specifying .yml file, we proposed this usage.

For the first case, we can follow the patten of "hawq command". change the
tablename to object, such as *hawq register [-h hostname] [-p port] [-U
username] [-d database]  [-f filepath] [-c config] <tablename>*

But for the second case, because the table name is inside .yml file, if we
change tablename to object and mark it as a necessary field, it's
duplicated with the name inside configure file. And what shall we do for
the name conflicts?

Could you suggest a better way for defining the usage? Thanks :)

Thanks
Lili

On Tue, Aug 16, 2016 at 8:23 AM, Lei Chang <lei_chang@apache.org> wrote:

> I think this is an very useful feature for backup/restore, disaster
> recovery and some other scenarios.
>
> From the usage side, "hawq register" follows the typically "hawq command"
> design pattern: that is, "hawq action object". But for "hawq register",
> there is no "object" here.
>
> ---------------------------
> hawq extract -o t1.yml t1;
> hawq register -h localhost -p 5432 -u me -d postgres -c t1.yml;
> ---------------------------
>
> Cheers
> Lei
>
>
> On Mon, Aug 15, 2016 at 3:25 PM, Hong Wu <xunzhangthu@gmail.com> wrote:
>
> > Hi HAWQ developers,
> >
> > This thread means to confirm the option usage of hawq register.
> >
> > There will be two scenarios for users to use the hawq register tool so
> far.
> > - I. Register external parquet data into HAWQ. For example, users want to
> > migrate parquet tables from HIVE to HAWQ as quick as possible. In this
> > case, only parquet format is supported and the original parquet files in
> > hive are moved.
> >
> > - II. User should be able to use hawq register to register table files
> into
> > a new HAWQ cluster. It is some kind of protecting against corruption from
> > users' perspective. Users use the last-known-good metadata to update the
> > portion of catalog managing HDFS blocks. The table files or dictionary
> > should be backuped(such as using distcp) into the same path in the new
> HDFS
> > setting. And in this case, both AO and Parquet formats are supported.
> >
> > Considering above cases, the designed options for hawq register looks
> > below:
> >
> > hawq register [-h hostname] [-p port] [-U username] [-d database] [-t
> > tablename] [-f filepath] [-c config]
> > Note that the -h, p, -U options are optional, the -c option and the -t,
> -f
> > options are mutually exclusive which are corresponding to two different
> > cases above. Consequently, the expected usage of hawq register should be
> > like below:
> >
> > - Case I
> > hadoop fs -put -f hdfs://localhost:8020/hive/original_data.paq
> > hdfs://localhost:8020/test_data.paq;
> >
> > create table t1(i int) with (appendonly = true, orientation=parquer);
> >
> > hawq register -h localhost -p 5432 -u me -d postgres -t t1 -f
> > hdfs://localhost:8020/test_data.paq;
> >
> > - Case II
> > hawq extract -o t1.yml t1;
> >
> > hawq register -h localhost -p 5432 -u me -d postgres -c t1.yml;
> >
> > Incorrect usage(in both of these cases, hawq resgiter will print an error
> > and then exit):
> > hawq register -h localhost -p 5432 -u me -d postgres -c t1.yml -t t1;
> > hawq register -h localhost -p 5432 -u me -d postgres -c t1.yml -f
> > hdfs://localhost:8020/test_data.paq;
> > hawq register -h localhost -p 5432 -u me -d postgres -c t1.yml -t t1 -f
> > hdfs://localhost:8020/test_data.paq;
> >
> > Does this design make sense, any comments? Thanks.
> >
> > Best
> > Hong
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message