hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bill Graham (JIRA)" <j...@apache.org>
Subject [jira] Created: (PIG-1174) Creation of output path should be done by storage function
Date Mon, 28 Dec 2009 02:16:33 GMT
Creation of output path should be done by storage function
----------------------------------------------------------

                 Key: PIG-1174
                 URL: https://issues.apache.org/jira/browse/PIG-1174
             Project: Pig
          Issue Type: Bug
            Reporter: Bill Graham


When executing a STORE command, Pig creates the output location before the storage function
gets called. This causes problems with storage functions that have logic to determine the
output location. See this thread:

http://www.mail-archive.com/pig-user%40hadoop.apache.org/msg01538.html

For example, when making a request like this:

STORE A INTO '/my/home/output' USING MultiStorage('/my/home/output','0', 'none', '\t');

Pig creates a file '/my/home/output' and then an exception is thrown when MultiStorage tries
to make a directory under '/my/home/output'. The workaround is to instead specify a dummy
location as the first path like so:

STORE A INTO '/my/home/output/temp' USING MultiStorage('/my/home/output','0', 'none', '\t');

Two changes should be made:
1. The path specified in the INTO clause should be available to the storage function so it
doesn't need to be duplicated.
2. The creation of the output paths should be delegated to the storage function.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message