accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eric Newton (Resolved) (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (ACCUMULO-375) Wikipedia Ingest needs more parallelism
Date Tue, 03 Apr 2012 15:38:24 GMT

     [ https://issues.apache.org/jira/browse/ACCUMULO-375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Eric Newton resolved ACCUMULO-375.
----------------------------------

    Resolution: Not A Problem
    
> Wikipedia Ingest needs more parallelism
> ---------------------------------------
>
>                 Key: ACCUMULO-375
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-375
>             Project: Accumulo
>          Issue Type: Improvement
>            Reporter: Adam Fuchs
>            Assignee: Adam Fuchs
>
> The wikipedia ingest Map job uses a derivative of the FileInputFormat, which launches
one job per file. Given the partitioning strategy and workload distribution, it makes sense
to launch multiple mappers per file. Each mapper can then take a chunk of the articles in
the file using the same partitioning strategy as the assignment of row IDs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message