helix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Maharajan Nachiappa <maharajan.na...@gmail.com>
Subject Fwd: Helix parallelism
Date Wed, 03 Sep 2014 17:49:41 GMT
Hi Kishore/kanak,

Thanks much for the guidance, I have tested the feature with minimum nodes it works as expected
but have not done the fullest testing.

I have a question, is there a way that I can get the resulting data back to consolidate or
aggregate in the client as an option along the TaskResult object which has status and info?
Say for example of returning 1 or 2 kb of results from 5 task participants, as an optional
data object. Similar to the map-reduce concept but a real time basis so given the client opportunity
to do consolidate results.

Regards,
Maha

On Aug 22, 2014, at 8:05 AM, kishore g <g.kishore@gmail.com> wrote:

Not sure if you are subscribed to the mailing list

---------- Forwarded message ----------
From: "Kanak Biscuitwala" <kanak.b@hotmail.com>
Date: Aug 21, 2014 10:02 AM
Subject: RE: Helix parallelism
To: "user@helix.apache.org" <user@helix.apache.org>
Cc: 

Yes, you can use the task framework, which hasn't been released yet, but will be soon. For
more on the task framework, you can read this blog post: http://engineering.linkedin.com/distributed-systems/ad-hoc-task-management-apache-helix

You can submit a job with 1000 tasks using either Java or YAML.

The YAML specification of this job would look something like:

name: MyWorkflow
jobs:
    - name: RunQueries

      command: RunQuery # The command corresponding to Task callbacks

      jobConfigMap: { # Arbitrary key-value pairs to pass to all tasks in this job

        k1: "v1",
        k2: "v2"

      }
      numConcurrentTasksPerInstance: 200 # Max parallelism per instance

      tasks: # Schedule 1000 tasks, each responsible for aggregating requests for a chunk
of partitions

        - taskConfigMap: { # Arbitrary key-value pairs to pass to this task

            query: "query1"
          }

        - taskConfigMap: {
            query: "query2"

          }
        - taskConfigMap: {

            query: "query3"
          } # Repeat for remaining 997 tasks



You can also see this class for an example of how to build jobs in Java: https://github.com/apache/helix/blob/master/helix-core/src/test/java/org/apache/helix/integration/task/TestIndependentTaskRebalancer.java

Then you just need to implement a Task callback and register it on each of the instances,
and Helix will take care of assignment and retries.
Date: Thu, 21 Aug 2014 09:07:11 -0700
Subject: Helix parallelism
From: maharajan.nachi@gmail.com
To: user@helix.apache.org

Hi,

I just started looking at the capability that helix can do Parallelism executing task evenly
in the cluster instances, resources. 

I have a requirement in executing different queries but in parallel to solve some issue. Can
helix help in this case?

For example
1. I have some 1000 different queries to be executed.
2. I have 5 nodes configured in the helix cluster capable of executing set of queries.
3. I need helix to distribute these 1000 different queries equally to the 5 nodes (200 per
node) and takes care re-executing failed set of queries. And notifies the controller about
the job done.

Can someone help me in understand how helix can solve this kind of issue? 

Regards,
Maha
Mime
View raw message