hawq-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cyrille Lintz <cli...@pivotal.io>
Subject HAWQ: Web external table on segments.
Date Wed, 05 Apr 2017 07:11:17 GMT
Hello,

>From the HDB guide (
http://hdb.docs.pivotal.io/212/hawq/reference/sql/CREATE-EXTERNAL-TABLE.html#topic1__section4),
I read about Web external tables

*Note: ON ALL/HOST is deprecated when creating a readable external table,
as HAWQ cannot guarantee scheduling executors on a specific host. Instead,
use ON MASTER, ON <number>, or SEGMENT <virtual_segment> to specify which
segment instances will execute the command.*


In my opinion, if possible, we should re-introduce the ON ALL option for
the external WEB tables,
I am concerned with the option ON <number> in the external web table
definition. We have to use the number of current hosts. So if we expand the
cluster, we will have to change this external web table.

- If we have a value smaller than the actual number of hosts, some rows
will miss.
- If we have a value greater than the actual number of hosts, some rows
will be duplicated.


If we add the option ON ALL:

- it will help to monitor the spill files
- it will help to read the segment log files (see the commented DDL
hawq_toolkit._hawq_log_segment_ext in the file $GPHOME/share/postgresql)


I know that the option ON HOST and ON ALL were deprecated due to elastic
runtime in HAWQ 2.x. It is related to the Hadoop architecture.

However, how could we execute once a shell on each host of the cluster via
an external web table?
In this case, we are not using Hadoop FS, but local FS.

Thanks,


*Cyrille LINTZ*Advisory Solution Architect  |  Pivotal Europe South
Mobile: + 33 (0)6 11 48 71 10 | clintz@pivotal.io

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message