hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alan Gates (JIRA)" <j...@apache.org>
Subject [jira] Resolved: (PIG-10) reduce encoding of intermediate results
Date Fri, 02 Nov 2007 16:23:50 GMT

     [ https://issues.apache.org/jira/browse/PIG-10?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Alan Gates resolved PIG-10.
---------------------------

    Resolution: Invalid

There is no requirement in pig that each tuple in a relation share the same schema, so it
will not always be an option to store the schema once up front in intermediate results.  Even
in the cases where the schema is known, complex data types with no guaranteed schemas (such
as maps) could be in the tuples and would still require markers in the code.  We could optimize
for the case where all tuples are the same and all tuples contain only atomic data, but its
not clear how we would know that to be the case.

> reduce encoding of intermediate results
> ---------------------------------------
>
>                 Key: PIG-10
>                 URL: https://issues.apache.org/jira/browse/PIG-10
>             Project: Pig
>          Issue Type: Improvement
>          Components: impl
>            Reporter: Olga Natkovich
>
> Currently, in intermediate results, the data is written with a marker for every column
in every row.  For instance if
> we are writing a row that has a schema of bag, atom, we'll write:
> BAGMARKER BAGDATA ATOMMARKER ATOMDATA
> There's no reason to write the markers for every row.  Is should be sufficient to write
it once at the beginning of the
> file and then remember it for subsequent rows.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message