asterixdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yingyi Bu <buyin...@gmail.com>
Subject Re: Let one Operator finished the job before another one begin in Hyracks
Date Tue, 11 Oct 2016 16:35:10 GMT
+1!

Best,
Yingyi

On Tue, Oct 11, 2016 at 9:32 AM, Mike Carey <dtabass@gmail.com> wrote:

> BUT AGAIN:  I think the preferred solution in this case is to do it in one
> job.  Mingda, I would suggest sync'ing up with Wenhai for a Skype meeting
> on how he/Preston have done essentially the very same thing in their use
> cases for parallel sorts and interval joins.  Hyracks has everything needed
> for this, as it turns out, without a multi-job need.
>
>
>
> On 10/11/16 9:26 AM, Yingyi Bu wrote:
>
>> You can search the usage of waitForCompletion in the code base, e.g.:
>>
>> APIFramework.java:
>>
>> public void executeJobArray(IHyracksClientConnection hcc,
>> JobSpecification[] specs, PrintWriter out)
>>          throws Exception {
>>      for (JobSpecification spec : specs) {
>>          spec.setMaxReattempts(0);
>>          JobId jobId = hcc.startJob(spec);
>>          long startTime = System.currentTimeMillis();
>>          hcc.waitForCompletion(jobId);
>>          long endTime = System.currentTimeMillis();
>>          double duration = (endTime - startTime) / 1000.00;
>>          out.println("<pre>Duration: " + duration + " sec</pre>");
>>      }
>>
>> }
>>
>>
>> You start a job and get the job Id, and then you can wait on the job id.
>>
>>
>> Best,
>>
>> Yingyi
>>
>>
>> On Tue, Oct 11, 2016 at 1:45 AM, 李文海 <lwh@whu.edu.cn> wrote:
>>
>> Hi, Mingda.
>>>      What you need is quite familiar with what I and Presten have done.
>>> Actually, I think we just need a shared
>>> object accommodated by joblet or task which should be also driven by a
>>> broadcast connector inbetween its input
>>> and output operators. We can talk about this by skype if needed.
>>> Best, Wenhai
>>>
>>>
>>> -----原始邮件-----
>>>> 发件人: "Mike Carey" <dtabass@gmail.com>
>>>> 发送时间: 2016年10月11日 星期二
>>>> 收件人: dev@asterixdb.apache.org
>>>> 抄送:
>>>> 主题: Re: Let one Operator finished the job before another one begin in
>>>>
>>> Hyracks
>>>
>>>> And both Wenhai and Preston have examples of doing the
>>>> fan-in-and-compute/fan-back-out pattern with blocking until the latter
>>>> part is done - Wenhai for finding range split points for parallel
>>>> sorting and Preston for similar things that arise in interval joins.
>>>> Can you guys chime in when you have a chance?  (Preston may be busy from
>>>> what I saw on Skype on Friday :-), with congrats being due!)
>>>>
>>>>
>>>> On 10/11/16 12:22 AM, Jianfeng Jia wrote:
>>>>
>>>>> Based on the described example, it seems possible to implement it in
>>>>>
>>>> one job by using MToNPartitioningConnectorDescriptor.
>>>
>>>> You can force that merge-BF-operator only runs in one partition by
>>>>>
>>>> using PartitionConstraintHelper.addAbsoluteLocationConstraint()
>>> function.
>>>
>>>> On Oct 10, 2016, at 11:43 PM, mingda li <limingda1993@gmail.com>
>>>>>>
>>>>> wrote:
>>>
>>>> Yeah, that will be easier. But for example, we have N nodes and in
>>>>>>
>>>>> each
>>>
>>>> node, it will generate a Bloom Filter(BF) for its own data. We need
>>>>>>
>>>>> to send
>>>
>>>> these BFs to one node for constructing a complete BF and then send
>>>>>>
>>>>> the BF
>>>
>>>> back to each node. I am not sure we can use multiple stage job for
>>>>>>
>>>>> this,
>>>
>>>> because there should be a 1->N and a N->1 connecter among nodes. If
>>>>>>
>>>>> in one
>>>
>>>> job, there may be no way to transfer data among nodes.
>>>>>> This is my idea. If this can be implemented by one multiple stage
>>>>>>
>>>>> job, that
>>>
>>>> will decrease a lot of my work :-)
>>>>>>
>>>>>> Bests,
>>>>>> Mingda
>>>>>>
>>>>>> On Mon, Oct 10, 2016 at 8:59 PM, Mike Carey <dtabass@gmail.com>
>>>>>>
>>>>> wrote:
>>>
>>>> Is there a reason for wanting two jobs?  I would think that one
>>>>>>>
>>>>>> multiple
>>>
>>>> stage job would be preferable.
>>>>>>>
>>>>>>> On Oct 10, 2016 1:21 PM, "mingda li" <limingda1993@gmail.com>
wrote:
>>>>>>>
>>>>>>> Oh, thanks Kim~
>>>>>>>>
>>>>>>>> On Mon, Oct 10, 2016 at 12:55 PM, Taewoo Kim <wangsaeu@gmail.com>
>>>>>>>>
>>>>>>> wrote:
>>>
>>>> Forwarded to dev.
>>>>>>>>>
>>>>>>>>> Best,
>>>>>>>>> Taewoo
>>>>>>>>>
>>>>>>>>> ---------- Forwarded message ----------
>>>>>>>>> From: mingda li <limingda1993@gmail.com>
>>>>>>>>> Date: Mon, Oct 10, 2016 at 11:21 AM
>>>>>>>>> Subject: Let one Operator finished the job before another
one
>>>>>>>>>
>>>>>>>> begin in
>>>
>>>> Hyracks
>>>>>>>>> To: users@asterixdb.apache.org
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> Now,I am trying to build a Bloom Filter(BF) before join.
The BF is
>>>>>>>>>
>>>>>>>> build
>>>>>>>
>>>>>>>> in
>>>>>>>>
>>>>>>>>> each node and sent to one node to combine. I want to
set a stop
>>>>>>>>>
>>>>>>>> sign
>>>
>>>> there
>>>>>>>>
>>>>>>>>> before sending the BF in each node. The stop sign means
it can only
>>>>>>>>>
>>>>>>>> send
>>>>>>>
>>>>>>>> the BF after it is build.
>>>>>>>>> The class HyracksConnection.waitForCompletion may help
this. But
>>>>>>>>>
>>>>>>>> I am
>>>
>>>> not
>>>>>>>>
>>>>>>>>> sure how to use it.
>>>>>>>>> Should I build two jobs: hcc.waitForCompletion(jobBuildBF);
>>>>>>>>> jobidSendBF=hcc.startJob(); ?
>>>>>>>>> Has anyone ever used the HyracksConnection.waitForCompletion?
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Mingda
>>>>>>>>>
>>>>>>>>>
>>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message