pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Pesto (JIRA)" <j...@apache.org>
Subject [jira] [Created] (PIG-2099) AllLoader in trunk does not work properly with JSON schemas
Date Fri, 27 May 2011 18:52:47 GMT
AllLoader in trunk does not work properly with JSON schemas

                 Key: PIG-2099
                 URL: https://issues.apache.org/jira/browse/PIG-2099
             Project: Pig
          Issue Type: Bug
          Components: data
    Affects Versions: 0.8.1, 0.8.0
            Reporter: Chris Pesto

The AllLoader in the Piggybank in trunk does not pass JSON-defined schemas to the child loaders
it instantiates.  If the schema is defined in the LOAD function, when Pig calls getSchema
on the AllLoader the AllLoader instantiates the child loader and calls the child's getSchema
if it respects the LoadMetadata interface.  If the AllLoader finds the JSON schema in a file,
it does not instantiate the child loader until prepareToRead is called, and the child does
not receive the schema.  I have hacked this in by adding to the AllLoader:

        transient String location = null;
        transient Job job = null;

then in AllLoader::setLocation:

        this.location = location;
        this.job = job;

then in AllLoader::prepareToRead:

        if (childLoadFunc instanceof LoadMetadata) {
                ((LoadMetadata) childLoadFunc).getSchema(location, job);

Although I suspect it is not good practice to store the location/job in the class variables
like that, I don't know a better way to fix this.


Also, getFuncSpecFromContent in the accompanying LoadFuncHelper class with the AllLoader should
be modified:

        funcSpec = new FuncSpec("org.apache.pig.piggybank.storage.PigStorageSchema()");

since it currently instantiates a normal PigStorage object, which does not understand pre-defined
schemas.  The documentation for the AllLoader should reference PigStorageSchema instead of
PigStorage as well.

This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message