hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hadoop QA (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-997) [zebra] Sorted Table Support by Zebra
Date Sat, 31 Oct 2009 08:49:59 GMT

    [ https://issues.apache.org/jira/browse/PIG-997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12772228#action_12772228

Hadoop QA commented on PIG-997:

+1 overall.  Here are the results of testing the latest attachment 
  against trunk revision 831481.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 173 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit

    +1 core tests.  The patch passed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/38/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/38/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/38/console

This message is automatically generated.

> [zebra] Sorted Table Support by Zebra
> -------------------------------------
>                 Key: PIG-997
>                 URL: https://issues.apache.org/jira/browse/PIG-997
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Yan Zhou
>            Assignee: Yan Zhou
>             Fix For: 0.6.0
>         Attachments: SortedTable.patch, SortedTable.patch, SortedTable.patch
> This new feature is for Zebra to support sorted data in storage. As a storage library,
Zebra will not sort the data by itself. But it will support creation and use of sorted data
either through PIG  or through map/reduce tasks that use Zebra as storage format.
> The sorted table keeps the data in a "totally sorted" manner across all TFiles created
by potentially all mappers or reducers.
> For sorted data creation through PIG's STORE operator ,  if the input data is sorted
through "ORDER BY", the new Zebra table will be marked as sorted on the sorted columns;
> For sorted data creation though Map/Reduce tasks,  three new static methods of the BasicTableOutput
class will be provided to allow or help the user to achieve the goal. "setSortInfo" allows
the user to specify the sorted columns of the input tuple to be stored; "getSortKeyGenerator"
and "getSortKey" help the user to generate the key acceptable by Zebra as a sorted key based
upon the schema, sorted columns and the input tuple.
> For sorted data read through PIG's LOAD operator, pass string "sorted" as an extra argument
to the TableLoader constructor to ask for sorted table to be loaded;
> For sorted data read through Map/Reduce tasks, a new static method of TableInputFormat
class, requireSortedTable, can be called to ask for a sorted table to be read. Additionally,
an overloaded version of the new method can be called to ask for a sorted table on specified
sort columns and comparator.
> For this release, sorted table only supported sorting in ascending order, not in descending
order. In addition, the sort keys must be of simple types not complex types such as RECORD,
> Multiple-key sorting is supported. But the ordering of the multiple sort keys is significant
with the first sort column being the primary sort key, the second being the secondary sort
key, etc.
> In this release, the sort keys are stored along with the sort columns where the keys
were originally created from, resulting in some data storage redundancy.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message