cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Johan Oskarsson (JIRA)" <j...@apache.org>
Subject [jira] Updated: (CASSANDRA-890) Get Hadoop input format sub splits in parallel
Date Wed, 17 Mar 2010 10:06:27 GMT

     [ https://issues.apache.org/jira/browse/CASSANDRA-890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Johan Oskarsson updated CASSANDRA-890:
--------------------------------------

    Attachment: CASSANDRA-890.patch

Updated to put the brackets in the right place. 

However with regards to the Executor I prefer that approach, is that a blocker for you?
It's a bit bloated but divides everything into nice and easy to understand components. It
saves us from accessing a collection from multiple threads too, a very minor gain I admit.

I also thought it would be good to make it easy to limit how many threads run at the same
in a future version by swapping out the executor type. For someone that runs very large jobs
this might be needed to avoid swamping the machine with tons of threads, not unlikely to be
required at Twitter.

> Get Hadoop input format sub splits in parallel
> ----------------------------------------------
>
>                 Key: CASSANDRA-890
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-890
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Contrib
>            Reporter: Johan Oskarsson
>            Assignee: Johan Oskarsson
>             Fix For: 0.7
>
>         Attachments: CASSANDRA-890.patch, CASSANDRA-890.patch
>
>
> To improve Hadoop job startup time we can multithread parts of the input format. Specifically
the fetching of "sub splits" from many nodes can be run in parallel.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message