helix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Maharajan Nachiappa <maharajan.na...@gmail.com>
Subject Fwd: Helix parallelism
Date Wed, 03 Sep 2014 17:49:41 GMT
Hi Kishore/kanak,

Thanks much for the guidance, I have tested the feature with minimum nodes it works as expected
but have not done the fullest testing.

I have a question, is there a way that I can get the resulting data back to consolidate or
aggregate in the client as an option along the TaskResult object which has status and info?
Say for example of returning 1 or 2 kb of results from 5 task participants, as an optional
data object. Similar to the map-reduce concept but a real time basis so given the client opportunity
to do consolidate results.


On Aug 22, 2014, at 8:05 AM, kishore g <g.kishore@gmail.com> wrote:

Not sure if you are subscribed to the mailing list

---------- Forwarded message ----------
From: "Kanak Biscuitwala" <kanak.b@hotmail.com>
Date: Aug 21, 2014 10:02 AM
Subject: RE: Helix parallelism
To: "user@helix.apache.org" <user@helix.apache.org>

Yes, you can use the task framework, which hasn't been released yet, but will be soon. For
more on the task framework, you can read this blog post: http://engineering.linkedin.com/distributed-systems/ad-hoc-task-management-apache-helix

You can submit a job with 1000 tasks using either Java or YAML.

The YAML specification of this job would look something like:

name: MyWorkflow
    - name: RunQueries

      command: RunQuery # The command corresponding to Task callbacks

      jobConfigMap: { # Arbitrary key-value pairs to pass to all tasks in this job

        k1: "v1",
        k2: "v2"

      numConcurrentTasksPerInstance: 200 # Max parallelism per instance

      tasks: # Schedule 1000 tasks, each responsible for aggregating requests for a chunk
of partitions

        - taskConfigMap: { # Arbitrary key-value pairs to pass to this task

            query: "query1"

        - taskConfigMap: {
            query: "query2"

        - taskConfigMap: {

            query: "query3"
          } # Repeat for remaining 997 tasks

You can also see this class for an example of how to build jobs in Java: https://github.com/apache/helix/blob/master/helix-core/src/test/java/org/apache/helix/integration/task/TestIndependentTaskRebalancer.java

Then you just need to implement a Task callback and register it on each of the instances,
and Helix will take care of assignment and retries.
Date: Thu, 21 Aug 2014 09:07:11 -0700
Subject: Helix parallelism
From: maharajan.nachi@gmail.com
To: user@helix.apache.org


I just started looking at the capability that helix can do Parallelism executing task evenly
in the cluster instances, resources. 

I have a requirement in executing different queries but in parallel to solve some issue. Can
helix help in this case?

For example
1. I have some 1000 different queries to be executed.
2. I have 5 nodes configured in the helix cluster capable of executing set of queries.
3. I need helix to distribute these 1000 different queries equally to the 5 nodes (200 per
node) and takes care re-executing failed set of queries. And notifies the controller about
the job done.

Can someone help me in understand how helix can solve this kind of issue? 

View raw message