hawq-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From dyozie <...@git.apache.org>
Subject [GitHub] incubator-hawq-docs pull request #83: HAWQ-1252 - pljava doc cleanup, workin...
Date Wed, 11 Jan 2017 19:59:59 GMT
Github user dyozie commented on a diff in the pull request:

    https://github.com/apache/incubator-hawq-docs/pull/83#discussion_r95658926
  
    --- Diff: markdown/plext/using_pljava.html.md.erb ---
    @@ -2,205 +2,222 @@
     title: Using PL/Java
     ---
     
    -This section contains an overview of the HAWQ PL/Java language. 
    +This section provides an overview of the HAWQ PL/Java procedural language extension.

     
     
     ## <a id="aboutpljava"></a>About PL/Java 
     
    -With the HAWQ PL/Java extension, you can write Java methods using your favorite Java
IDE and install the JAR files that implement the methods in your HAWQ cluster.
    +With the PL/Java extension, you can write Java methods using your favorite Java IDE and
invoke the methods from PostgreSQL user-defined functions (UDFs). 
     
    -**Note**: If building HAWQ from source, you must specify PL/Java as a build option when
compiling HAWQ. To use PL/Java in a HAWQ deployment, you must explicitly enable the PL/Java
extension in all desired databases.  
    +The HAWQ PL/Java package is based on the open source PL/Java 1.4.0 and provides the following
features:
     
    -The HAWQ PL/Java package is based on the open source PL/Java 1.4.0. HAWQ PL/Java provides
the following features.
    -
    -- Ability to execute PL/Java functions with Java 1.6 or 1.7.
    -- Standardized utilities (modeled after the SQL 2003 proposal) to install and maintain
Java code in the database.
    -- Standardized mappings of parameters and result. Complex types as well as sets are supported.
    -- An embedded, high performance, JDBC driver utilizing the internal HAWQ Database SPI
routines.
    +- PL/Java function execution with Java 1.6 or 1.7.
    +- Standardized mappings of Java and PostgreSQL parameters and results. Complex types
as well as sets are supported.
    +- Two HAWQ Java languages:
    +	- `java` - Tusted PL/Java language
    +	- `javau` - Untrusted PL/Java language (superusers only)
    +- An embedded, high performance, JDBC driver utilizing the internal HAWQ Database Server
Programming Interface (SPI).
     - Metadata support for the JDBC driver. Both `DatabaseMetaData` and `ResultSetMetaData`
are included.
    -- The ability to return a `ResultSet` from a query as an alternative to building a ResultSet
row by row.
    +- A standard way of passing parameters and return values. Complex types and sets are
passed using the standard JDBC `ResultSet` class.
    +- The ability to return a `ResultSet` from a query as an alternative to building a `ResultSet`
row by row.
     - Full support for savepoints and exception handling.
    -- The ability to use IN, INOUT, and OUT parameters.
    -- Two separate HAWQ languages:
    -	- pljava, TRUSTED PL/Java language
    -	- pljavau, UNTRUSTED PL/Java language
     - Transaction and Savepoint listeners enabling code execution when a transaction or savepoint
is committed or rolled back.
     - Integration with GNU GCJ on selected platforms.
     
    -A function in SQL will appoint a static method in a Java class. In order for the function
to execute, the appointed class must available on the class path specified by the HAWQ server
configuration parameter `pljava_classpath`. The PL/Java extension adds a set of functions
that helps to install and maintain the Java classes. Classes are stored in normal Java archives,
JAR files. A JAR file can optionally contain a deployment descriptor that in turn contains
SQL commands to be executed when the JAR is deployed or undeployed. The functions are modeled
after the standards proposed for SQL 2003.
    -
    -PL/Java implements a standard way of passing parameters and return values. Complex types
and sets are passed using the standard JDBC ResultSet class.
    -
    -A JDBC driver is included in PL/Java. This driver calls HAWQ internal SPI routines. The
driver is essential since it is common for functions to make calls back to the database to
fetch data. When PL/Java functions fetch data, they must use the same transactional boundaries
that are used by the main function that entered PL/Java execution context.
    +PL/Java is optimized for performance. The Java virtual machine executes within the same
process as the backend, minimizing call overhead. PL/Java brings the power of Java to the
database itself, enabling the database-intensive business logic to execute as close to the
actual data as possible.
     
    -PL/Java is optimized for performance. The Java virtual machine executes within the same
process as the backend to minimize call overhead. PL/Java is designed with the objective to
enable the power of Java to the database itself so that database intensive business logic
can execute as close to the actual data as possible.
    -
    -The standard Java Native Interface (JNI) is used when bridging calls between the backend
and the Java VM.
    +Calls between the HAWQ backend and the Java VM are bridged using the standard Java Native
Interface (JNI).
     
     
     ## <a id="abouthawqpljava"></a>About HAWQ PL/Java 
     
    +PL/Java is embedded in your HAWQ product distribution or within your HAWQ build if you
chose to enable it as a build option. To use PL/Java in a HAWQ deployment, you must explicitly
enable the PL/Java extension in all desired databases.
    +
     There are a few key differences between the implementation of PL/Java in standard PostgreSQL
and HAWQ.
     
    -### <a id="pljavafunctions"></a>Functions 
    +### <a id="pljavafunctions"></a>Functions Not Supported
     
    -The following functions are not supported in HAWQ. The classpath is handled differently
in a distributed HAWQ environment than in the PostgreSQL environment.
    +The following functions are not supported in HAWQ. The PL/Java classpath is handled differently
in a distributed HAWQ environment than that of the PostgreSQL environment.
     
    -- sqlj.install_jar
    -- sqlj.install_jar
    -- sqlj.replace_jar
    -- sqlj.remove_jar
    -- sqlj.get_classpath
    -- sqlj.set_classpath
    +- sqlj.install_jar()
    +- sqlj.install_jar()
    +- sqlj.replace_jar()
    +- sqlj.remove_jar()
    +- sqlj.get_classpath()
    +- sqlj.set_classpath()
     
    -HAWQ uses the `pljava_classpath` server configuration parameter in place of the `sqlj.set_classpath`
function.
    +HAWQ uses the `pljava_classpath` server configuration parameter in place of the `sqlj.set_classpath()`
function.
     
     ### <a id="serverconfigparams"></a>Server Configuration Parameters 
     
    -PL/Java uses server configuration parameters to configure classpath, Java VM, and other
options. Refer to the [Server Configuration Parameter Reference](../reference/HAWQSiteConfig.html)
for general information about HAWQ server configuration parameters.
    +PL/Java uses server configuration parameters to configure the classpath, Java Virtual
Machine (VM), and other options. Refer to the [Server Configuration Parameter Reference](../reference/HAWQSiteConfig.html)
for general information about HAWQ server configuration parameters.
     
    -The following server configuration parameters are used by PL/Java in HAWQ. These parameters
replace the `pljava.*` parameters that are used in the standard PostgreSQL PL/Java implementation.
    +The following server configuration parameters are used by HAWQ PL/Java. These parameters
replace the `pljava.*` configuration parameters that are used in the standard PostgreSQL PL/Java
implementation.
     
     #### pljava\_classpath
     
    -A colon (:) separated list of the jar files containing the Java classes used in any PL/Java
functions. The jar files must be installed in the same locations on all HAWQ hosts. With the
trusted PL/Java language handler, jar file paths must be relative to the `$GPHOME/lib/postgresql/java/`
directory. With the untrusted language handler (javaU language tag), paths may be relative
to `$GPHOME/lib/postgresql/java/` or absolute.
    +A PL/Java user-defined function will appoint a static method in a Java class. In order
for the function to execute, the appointed class must be available on the class path specified
by the [`pljava_classpath`](../reference/guc/parameter_definitions.html#pljava_classpath)
HAWQ server configuration parameter.
     
    -#### pljava\_statement\_cache\_size
    -
    -Sets the size in KB of the Most Recently Used (MRU) cache for prepared statements.
    +`pljava_classpath` is a colon (:) separated list of the JAR files implementing the Java
classes referenced in any PL/Java user-defined functions. The JAR files must be installed
in the same locations on all HAWQ hosts. When using the trusted PL/Java language handler (`java`
language tag), JAR file paths must be relative to the `$GPHOME/lib/postgresql/java/` directory.
Relative or absolute JAR file paths may be used with the untrusted PL/Java language handler
(`javau` language tag).
     
     #### pljava\_release\_lingering\_savepoints
     
    -If TRUE, lingering savepoints will be released on function exit. If FALSE, they will
be rolled back.
    +[`pljava_release_lingering_savepoints`](../reference/guc/parameter_definitions.html#pljava_release_lingering_savepoints)
identifies the action to perform when lingering savepoints exist on exit of a PL/Java user-defined
function. If TRUE, lingering savepoints will be released on function exit. If FALSE, lingering
savepoints will be rolled back.
    +
    +#### pljava\_statement\_cache\_size
    +
    +[`pljava_statement_cache_size`](../reference/guc/parameter_definitions.html#pljava_statement_cache_size)
sets the size (in KB) of the Most Recently Used (MRU) cache for prepared statements.
     
     #### pljava\_vmoptions
     
    -Defines the start up options for the Java VM.
    +[`pljava_vmoptions`](../reference/guc/parameter_definitions.html#pljava_vmoptions) defines
the start-up options for the Java VM.
     
     ### <a id="setting_serverconfigparams"></a>Setting PL/Java Configuration
Parameters 
     
    -You can set PL/Java server configuration parameters at the session level, or globally
across your whole cluster. Your HAWQ cluster configuration must be reloaded after setting
a server configuration value globally.
    +You can set PL/Java server configuration parameters at the session level, or globally
across your whole cluster.
     
     #### <a id="setsrvrcfg_global"></a>Cluster Level
     
     You will perform different procedures to set a PL/Java server configuration parameter
for your whole HAWQ cluster depending upon whether you manage your cluster from the command
line or use Ambari. If you use Ambari to manage your HAWQ cluster, you must ensure that you
update server configuration parameters only via the Ambari Web UI. If you manage your HAWQ
cluster from the command line, you will use the `hawq config` command line utility to set
PL/Java server configuration parameters.
     
    +Your HAWQ cluster configuration must be reloaded after setting a server configuration
value globally.
    +
     The following examples add a JAR file named `myclasses.jar` to the `pljava_classpath`
server configuration parameter for the entire HAWQ cluster.
     
     If you use Ambari to manage your HAWQ cluster:
     
    -1. Set the `pljava_classpath` configuration property to include `myclasses.jar` via the
HAWQ service **Configs > Advanced > Custom hawq-site** drop down. 
    +1. Set the `pljava_classpath` configuration property **Key** **Value** to include `myclasses.jar`
via the HAWQ service **Configs > Advanced > Custom hawq-site** drop down. 
     2. Select **Service Actions > Restart All** to load the updated configuration.
     
     If you manage your HAWQ cluster from the command line:
     
    -1.  Log in to the HAWQ master host as a HAWQ administrator and source the file `/usr/local/hawq/greenplum_path.sh`.
    +1.  Log in to the HAWQ master node as a HAWQ administrator and set up your HAWQ environment.
     
         ``` shell
    -    $ source /usr/local/hawq/greenplum_path.sh
    +    $ ssh gpadmin@master
    +    gpadmin@master$ source /usr/local/hawq/greenplum_path.sh
         ```
     
     1. Use the `hawq config` utility to set `pljava_classpath`:
     
         ``` shell
    -    $ hawq config -c pljava_classpath -v \'myclasses.jar\'
    +    gpadmin@master$ hawq config -c pljava_classpath -v 'myclasses.jar'
         ```
    +    
    +    The `-c` option identifies the name of the server configuration parameter, `-v` identifies
the new value.
    +    
     2. Reload the HAWQ configuration:
     
         ``` shell
    -    $ hawq stop cluster -u
    +    gpadmin@master$ hawq stop cluster -u
         ```
     
     #### <a id="setsrvrcfg_session"></a>Session Level 
     
    -To set a PL/Java server configuration parameter for only the *current* database session,
set the parameter within the `psql` subsystem. For example, to set `pljava_classpath`:
    +To set a PL/Java server configuration parameter for only the *current* database session,
set the parameter within the `psql` subsystem. For example:
     	
     ``` sql
     => SET pljava_classpath='myclasses.jar';
     ```
     
    +## <a id="security"></a>Security 
     
    -## <a id="enablepljava"></a>Enabling and Removing PL/Java Support 
    +This section describes specific security considerations when using PL/Java.
    +
    +### <a id="installation"></a>Enable/Disable 
    +
    +Only a database superuser can enable/disable PL/Java. The PL/Java utility functions are
created using `SECURITY DEFINER` privileges; they execute with the access permissions that
were granted to the creator of the functions.
    +
    +### <a id="trustedlang"></a>Trusted/Untrusted Language 
     
    -The PL/Java extension must be explicitly enabled on each database in which it will be
used.
    +PL/Java includes trusted and untrusted language variants.
     
    +The PL/Java trusted language is named `java`. The trusted PL/Java language has no access
to the file system as stipulated by PostgreSQL definition of a trusted language. Any database
user can create and execute functions in the trusted `java` language.
    +
    +The PL/Java untrusted language is named `javau`. `javau` offers no way of restricting
what you can program in UDFs created with the language. Creating untrusted PL/Java UDFs is
permitted only by database superusers. Executing untrusted PL/Java UDFs is permitted only
by database superusers and other database users explicitly `GRANT`ed the permissions.
    +## <a id="enablepljava"></a>Enabling and Removing PL/Java Support 
    +
    +The PL/Java extension must be explicitly enabled on each database in which it will be
used. To remove PL/Java support, the extension must be explicitly removed from each database
in which it was used.
     
     ### <a id="pljavaprereq"></a>Prerequisites 
     
     Before you enable PL/Java:
     
     1. Ensure that you have installed a supported Java runtime environment and that the `$JAVA_HOME`
variable is set to the same path on the master and all segment nodes.
     
    -2. Perform the following step on all machines to set up `ldconfig` for the installed
JDK:
    +2. Perform the following step on all nodes in your HAWQ cluster to set up the shared
library cache for the installed JDK:
     
     	``` shell
    -	$ echo "$JAVA_HOME/jre/lib/amd64/server" > /etc/ld.so.conf.d/libjdk.conf
    -	$ ldconfig
    +	root@hawq-node$ echo "$JAVA_HOME/jre/lib/amd64/server" > /etc/ld.so.conf.d/libjdk.conf
    +	root@hawq-node$ ldconfig
     	```
     4. Make sure that your HAWQ cluster is running, you have sourced `greenplum_path.sh`
and that your `$GPHOME` environment variable is set.
     
    +    ``` shell
    +	$ ssh gpadmin@master
    +	gpadmin@master$ source /usr/local/hawq/greenplum_path.sh
    +	gpadmin@master$ hawq state
    +	```
     
     ### <a id="enablepljava"></a>Enable PL/Java and Install JAR Files 
     
    -To use PL/Java:
    +To use PL/Java, you must:
     
     1. Enable the language for each database.
    -1. Install user-created JAR files on all HAWQ hosts.
    -1. Add the names of the JAR files to the HAWQ `pljava_classpath` server configuration
parameter. This parameter value should identify a list of the installed JAR files.
    +1. Install custom JAR files on all HAWQ hosts.
    +1. Add the names of the JAR files to the HAWQ `pljava_classpath` server configuration
parameter. This parameter value should identify a list of all installed JAR files.
     
     #### <a id="enablepljava"></a>Enable PL/Java and Install JAR Files 
     
     Perform the following steps as the `gpadmin` user:
     
    -1. Enable PL/Java by running the `$GPHOME/share/postgresql/pljava/install.sql` SQL script
in the databases that will use PL/Java. The `install.sql` script registers both the trusted
and untrusted PL/Java languages. For example, the following command enables PL/Java on a database
named `testdb`:
    +1. Enable PL/Java by running the `$GPHOME/share/postgresql/pljava/install.sql` SQL script
on the databases that will use PL/Java. The `install.sql` script registers both the trusted
(`java`) and untrusted PL/Java (`javau`) languages. For example, the following command enables
the PL/Java languages on a database named `testdb`:
     
     	``` shell
    -	$ psql -d testdb -f $GPHOME/share/postgresql/pljava/install.sql
    +	gpadmin@master$ psql -d testdb -f $GPHOME/share/postgresql/pljava/install.sql
     	```
     	
     	To enable the PL/Java extension in all new HAWQ databases, run the script on the `template1`
database: 
     
         ``` shell
    -    $ psql -d template1 -f $GPHOME/share/postgresql/pljava/install.sql
    +    gpadmin@master$ psql -d template1 -f $GPHOME/share/postgresql/pljava/install.sql
         ```
     
    -    Use this option *only* if you are certain you want to enable PL/Java in all new databases.
    +    Use this option *only* if you are certain you want to enable both the trusted and
untrusted versions of PL/Java in all newly-created databases. This operation takes effect
on all databases created after the command is invoked.
     	
    -2. Copy your Java archives (JAR files) to `$GPHOME/lib/postgresql/java/` on all HAWQ
hosts. This example uses the `hawq scp` utility to copy the `myclasses.jar` file located in
the current directory:
    +2. Copy all custom Java JAR files to `$GPHOME/lib/postgresql/java/` on all HAWQ nodes.
This example uses the `hawq scp` utility to copy the `myclasses.jar` file located in the current
directory to the list of hosts identified in the `hawq_hosts` file, also located in the current
directory:
    --- End diff --
    
    I think it would be cleaner to remove "located in the current directly" from the sentence,
and just include `./` in the actual command.  The way it reads now suggest that the current
directory is kept on the destination hosts, but that's not what was intended.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

Mime
View raw message