pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gunther Hagleitner (JIRA)" <j...@apache.org>
Subject [jira] Resolved: (PIG-757) Using schemes in load and store paths
Date Wed, 08 Apr 2009 22:28:12 GMT

     [ https://issues.apache.org/jira/browse/PIG-757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Gunther Hagleitner resolved PIG-757.

    Resolution: Duplicate

> Using schemes in load and store paths
> -------------------------------------
>                 Key: PIG-757
>                 URL: https://issues.apache.org/jira/browse/PIG-757
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Gunther Hagleitner
> As part of the multiquery optimization work there's a need to use absolute paths for
load and store operations (because the current directory changes during the execution of the
script). In order to do so, the suggestion is to change the semantics of the location/filename
string used in LoadFunc and Slicer/Slice.
> The proposed change is:
>    * Load locations without a scheme part are expected to be hdfs (mapreduce mode) or
local (local mode) paths
>    * Any hdfs or local path will be translated to a fully qualified absolute path before
it is handed to either a LoadFunc or Slicer
>    * Any scheme other than file or hdfs will result in the load path be passed through
to the LoadFunc or Slicer without any modification.
> Example:
> If you have a LoadFunc that reads from a database, right now the following could be used:
> {{{
> a = load 'table' using DBLoader();
> }}}
> With the proposed changes table would be translated into an hdfs path though ("hdfs://..../table").
Probably not what the loader wants to see. So in order to make this work one would use:
> {{{
> a = load 'sql://table' using DBLoader();
> }}}
> Now the DBLoader would see the unchanged string "sql://table". And pig will not use the
string as an hdfs location.
> This is an incompatible change but it's hopefully few existing Slicers/Loaders that are
affected. This behavior is part of the multiquery work and can be turned off (reverted back)
by using the "no_multiquery" flag.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message