hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Olga Natkovich (JIRA)" <j...@apache.org>
Subject [jira] Resolved: (PIG-96) It should be possible to spill big databags to HDFS
Date Mon, 26 Jan 2009 20:03:59 GMT

     [ https://issues.apache.org/jira/browse/PIG-96?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Olga Natkovich resolved PIG-96.

    Resolution: Fixed

Based on the discussion I don't see a reason to spill to DFS

> It should be possible to spill big databags to HDFS
> ---------------------------------------------------
>                 Key: PIG-96
>                 URL: https://issues.apache.org/jira/browse/PIG-96
>             Project: Pig
>          Issue Type: Improvement
>          Components: data
>            Reporter: Pi Song
> Currently databags only get spilled to local disk which costs  2  disk io operations.If
databags are too big, this is not efficient. 
> We should take advantage of HDFS so if the databag is too big (determined by DataBag.getMemorySize()
>  a big  threshold), let's spill it to HDFS. Also read from HDFS in parallel when data
is required.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message