pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alan Gates (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-1842) Improve Scalability of the XMLLoader for large datasets such as wikipedia
Date Thu, 03 Mar 2011 20:29:37 GMT

    [ https://issues.apache.org/jira/browse/PIG-1842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13002190#comment-13002190
] 

Alan Gates commented on PIG-1842:
---------------------------------

Patch 2 checked into 0.8 branch.

> Improve Scalability of the XMLLoader for large datasets such as wikipedia
> -------------------------------------------------------------------------
>
>                 Key: PIG-1842
>                 URL: https://issues.apache.org/jira/browse/PIG-1842
>             Project: Pig
>          Issue Type: Improvement
>          Components: impl
>    Affects Versions: 0.7.0, 0.8.0, 0.9.0
>            Reporter: Viraj Bhat
>            Assignee: Vivek Padmanabhan
>             Fix For: 0.7.0, 0.8.0, 0.9.0
>
>         Attachments: PIG-1842_1.patch, PIG-1842_2.patch, TEST-org.apache.pig.piggybank.test.storage.TestXMLLoader.txt
>
>
> The current XMLLoader for Pig, does not work well for large datasets such as the wikipedia
dataset. Each mapper reads in the entire XML file resulting in extermely slow run times.
> Viraj

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message