hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gunther Hagleitner (JIRA)" <j...@apache.org>
Subject [jira] Created: (PIG-758) Converting load/store locations into fully qualified absolute paths
Date Wed, 08 Apr 2009 19:57:13 GMT
Converting load/store locations into fully qualified absolute paths
-------------------------------------------------------------------

                 Key: PIG-758
                 URL: https://issues.apache.org/jira/browse/PIG-758
             Project: Pig
          Issue Type: Bug
            Reporter: Gunther Hagleitner


As part of the multiquery optimization work there is a need to use absolute paths for load
and store operations (because the current directory changes during the execution of the script).
In order to do so, we are suggesting a change to the semantics of the location/filename string
used in LoadFunc and Slicer/Slice.

The proposed change is:

   * Load locations without a scheme part are expected to be hdfs (mapreduce mode) or local
(local mode) paths
   * Any hdfs or local path will be translated to a fully qualified absolute path before it
is handed to either a LoadFunc or Slicer
   * Any scheme other than "file" or "hdfs" will result in the load path to be passed through
to the LoadFunc or Slicer without any modification.

Example:

If you have a LoadFunc that reads from a database, in the current system the following could
be used:

{{{
a = load 'table' using DBLoader();
}}}

With the proposed changes table would be translated into an hdfs path though ("hdfs://..../table").
Probably not what the DBLoader would want to see. In order to make it work one could use:

{{{
a = load 'sql://table' using DBLoader();
}}}

Now the DBLoader would see the unchanged string "sql://table".

This is an incompatible change, but hopefully not affecting many existing Loaders/Slicers.
Since this is needed with the multiquery feature, the behavior can be reverted back by using
the "no_multiquery" pig flag.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message