pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Taylor Finnell (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PIG-2872) StoreFuncInterface.setStoreLocation get's a copy of a Configuration object
Date Thu, 10 Apr 2014 13:45:19 GMT

    [ https://issues.apache.org/jira/browse/PIG-2872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13965357#comment-13965357
] 

Taylor Finnell commented on PIG-2872:
-------------------------------------

I was working with [~wattsinabox] when we experienced the issue. Our script is roughly as
follows...

{code}
A = LOAD '...' USING CSVLoader ...;
STORE A INTO '/tmp/A-unused' USING DBStorage (org.postgresql.Driver, ..., INSERT INTO ....);
B = FOREACH A GENERATE X, Y, CONCAT(X, Y) as Z;
STORE B INTO '/tmp/B-unused' USING DBStorage (org.postgresql.Driver, ..., INSERT INTO ....);
{code}

Both DBStorage calls insert into different tables in the same database.

When the script is run both A, B are stored into their /tmp/ locations. However, the data
never makes it into the database. We found two ways to get the data to make it into the database.
The first, was to add a DUMP B command after the assignment of B. The second was to execute
the script with the -M flag.



> StoreFuncInterface.setStoreLocation get's a copy of a Configuration object
> --------------------------------------------------------------------------
>
>                 Key: PIG-2872
>                 URL: https://issues.apache.org/jira/browse/PIG-2872
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.11
>         Environment: Pig trunk, Hadoop 0.20.205 with Kerberos, ElasticSearch trunk, Wonderdog
trunk
>            Reporter: Evert Lammerts
>
> When an implementation of StoreFuncInterface.setStoreLocation is called from JobControlCompiler.getJob,
it is passed a copy of the Configuration that will be used for the Job that will be submitted:
> {code:title=JobControlCompiler.java}
> sFunc.setStoreLocation(st.getSFile().getFileName(), new org.apache.hadoop.mapreduce.Job(nwJob.getConfiguration()));
> {code}
> When a new org.apache.hadoop.mapreduce.Job is created it creates a copy of the Configuration
object, as far as I know. Thus anything added to the Configuration object in the implementation
of setStoreLocation will not be included in the Configuration of nwJob in JobControlCompiler.getJob.
> I notice this goes wrong in Wonderdog, which needs to include the Elasticsearch configuration
file in the DistributedCache. It is added to mapred.cache.files through setStoreLocation,
but this setting doesn't make it back into the Job returned by JobControlCompiler.getJob,
and is therefore never localized.
> This might be intentional semantics within Pig, but I'm not familiar enough with StoreFuncs
to know whether it is.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message