pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vivek Padmanabhan (JIRA)" <j...@apache.org>
Subject [jira] Updated: (PIG-1842) Improve Scalability of the XMLLoader for large datasets such as wikipedia
Date Wed, 09 Feb 2011 10:34:57 GMT

     [ https://issues.apache.org/jira/browse/PIG-1842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Vivek Padmanabhan updated PIG-1842:
-----------------------------------

    Attachment: PIG-1842_2.patch

Attaching the patch again

> Improve Scalability of the XMLLoader for large datasets such as wikipedia
> -------------------------------------------------------------------------
>
>                 Key: PIG-1842
>                 URL: https://issues.apache.org/jira/browse/PIG-1842
>             Project: Pig
>          Issue Type: Improvement
>          Components: impl
>    Affects Versions: 0.7.0, 0.8.0, 0.9.0
>            Reporter: Viraj Bhat
>            Assignee: Vivek Padmanabhan
>             Fix For: 0.7.0, 0.8.0, 0.9.0
>
>         Attachments: PIG-1842_1.patch, PIG-1842_2.patch
>
>
> The current XMLLoader for Pig, does not work well for large datasets such as the wikipedia
dataset. Each mapper reads in the entire XML file resulting in extermely slow run times.
> Viraj

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message