hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alan Gates (JIRA)" <j...@apache.org>
Subject [jira] Resolved: (PIG-241) Sharding and joins
Date Fri, 15 Jan 2010 06:10:54 GMT

     [ https://issues.apache.org/jira/browse/PIG-241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Alan Gates resolved PIG-241.
----------------------------

    Resolution: Won't Fix

We have chosen a different approach to this.  Our merge join does take advantage of sort order,
but does not require that data be partitioned in the same way in order to do the join, as
the this suggested sharding approach does.

> Sharding and joins
> ------------------
>
>                 Key: PIG-241
>                 URL: https://issues.apache.org/jira/browse/PIG-241
>             Project: Pig
>          Issue Type: New Feature
>          Components: data
>            Reporter: John DeTreville
>
> Many large distributed systems for storage and computing over tables divide these tables
into smaller _shards,_ such that all rows with the same (primary) key will appear in the same
shard. If two tables are consistently sharded, then they can be joined shard-by-shard. If
corresponding shards are stored on the same hosts (or racks), then joins can be performed
locally on those hosts without copying the rows of the tables over the network; this can produce
significant speedups.
> Pig does not currently provide application-controlled sharding and the associated shard
placement and computation placement. The performance of joins therefore suffers in many scenarios;
rows are passed over the network multiple times when performing a join. If Pig (and Hadoop)
could provide the ability for the application to shard tables consistently, according to an
application-controlled policy, joins could be completely local operations and could in many
cases perform much better.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message