asterixdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Carey <dtab...@gmail.com>
Subject Re: Let one Operator finished the job before another one begin in Hyracks
Date Tue, 11 Oct 2016 18:50:30 GMT
I think there will also be some helpful design documents that he and 
Preston can both share (with pics of the Hyracks jobs and the 
activities/stages involved).



On 10/11/16 10:52 AM, mingda li wrote:
> Oh, thanks for all the explanation:-)
> I will talk with Wenhai about how they implement such function and try to
> finish this in one job.
>
> Bests,
> Mingda
>
> On Tue, Oct 11, 2016 at 9:52 AM, 李文海 <lwh@whu.edu.cn> wrote:
>
>>
>>
>>> -----原始邮件-----
>>> 发件人: "Yingyi Bu" <buyingyi@gmail.com>
>>> 发送时间: 2016年10月12日 星期三
>>> 收件人: dev@asterixdb.apache.org
>>> 抄送:
>>> 主题: Re: Let one Operator finished the job before another one begin in
>> Hyracks
>>> +1!
>>>
>>> Best,
>>> Yingyi
>>>
>>> On Tue, Oct 11, 2016 at 9:32 AM, Mike Carey <dtabass@gmail.com> wrote:
>>>
>>>> BUT AGAIN:  I think the preferred solution in this case is to do it in
>> one
>>>> job.  Mingda, I would suggest sync'ing up with Wenhai for a Skype
>> meeting
>>>> on how he/Preston have done essentially the very same thing in their
>> use
>>>> cases for parallel sorts and interval joins.  Hyracks has everything
>> needed
>>>> for this, as it turns out, without a multi-job need.
>>>>
>>>>
>>>>
>>>> On 10/11/16 9:26 AM, Yingyi Bu wrote:
>>>>
>>>>> You can search the usage of waitForCompletion in the code base, e.g.:
>>>>>
>>>>> APIFramework.java:
>>>>>
>>>>> public void executeJobArray(IHyracksClientConnection hcc,
>>>>> JobSpecification[] specs, PrintWriter out)
>>>>>           throws Exception {
>>>>>       for (JobSpecification spec : specs) {
>>>>>           spec.setMaxReattempts(0);
>>>>>           JobId jobId = hcc.startJob(spec);
>>>>>           long startTime = System.currentTimeMillis();
>>>>>           hcc.waitForCompletion(jobId);
>>>>>           long endTime = System.currentTimeMillis();
>>>>>           double duration = (endTime - startTime) / 1000.00;
>>>>>           out.println("<pre>Duration: " + duration + " sec</pre>");
>>>>>       }
>>>>>
>>>>> }
>>>>>
>>>>>
>>>>> You start a job and get the job Id, and then you can wait on the job
>> id.
>>>>>
>>>>> Best,
>>>>>
>>>>> Yingyi
>>>>>
>>>>>
>>>>> On Tue, Oct 11, 2016 at 1:45 AM, 李文海 <lwh@whu.edu.cn> wrote:
>>>>>
>>>>> Hi, Mingda.
>>>>>>       What you need is quite familiar with what I and Presten have
>> done.
>>>>>> Actually, I think we just need a shared
>>>>>> object accommodated by joblet or task which should be also driven
by
>> a
>>>>>> broadcast connector inbetween its input
>>>>>> and output operators. We can talk about this by skype if needed.
>>>>>> Best, Wenhai
>>>>>>
>>>>>>
>>>>>> -----原始邮件-----
>>>>>>> 发件人: "Mike Carey" <dtabass@gmail.com>
>>>>>>> 发送时间: 2016年10月11日 星期二
>>>>>>> 收件人: dev@asterixdb.apache.org
>>>>>>> 抄送:
>>>>>>> 主题: Re: Let one Operator finished the job before another
one begin
>> in
>>>>>> Hyracks
>>>>>>
>>>>>>> And both Wenhai and Preston have examples of doing the
>>>>>>> fan-in-and-compute/fan-back-out pattern with blocking until the
>> latter
>>>>>>> part is done - Wenhai for finding range split points for parallel
>>>>>>> sorting and Preston for similar things that arise in interval
joins.
>>>>>>> Can you guys chime in when you have a chance?  (Preston may be
busy
>> from
>>>>>>> what I saw on Skype on Friday :-), with congrats being due!)
>>>>>>>
>>>>>>>
>>>>>>> On 10/11/16 12:22 AM, Jianfeng Jia wrote:
>>>>>>>
>>>>>>>> Based on the described example, it seems possible to implement
it
>> in
>>>>>>> one job by using MToNPartitioningConnectorDescriptor.
>>>>>>> You can force that merge-BF-operator only runs in one partition
by
>>>>>>> using PartitionConstraintHelper.addAbsoluteLocationConstraint()
>>>>>> function.
>>>>>>
>>>>>>> On Oct 10, 2016, at 11:43 PM, mingda li <limingda1993@gmail.com>
>>>>>>>> wrote:
>>>>>>> Yeah, that will be easier. But for example, we have N nodes and
in
>>>>>>>> each
>>>>>>> node, it will generate a Bloom Filter(BF) for its own data. We
need
>>>>>>>> to send
>>>>>>> these BFs to one node for constructing a complete BF and then
send
>>>>>>>> the BF
>>>>>>> back to each node. I am not sure we can use multiple stage job
for
>>>>>>>> this,
>>>>>>> because there should be a 1->N and a N->1 connecter among
nodes. If
>>>>>>>> in one
>>>>>>> job, there may be no way to transfer data among nodes.
>>>>>>>>> This is my idea. If this can be implemented by one multiple
stage
>>>>>>>>>
>>>>>>>> job, that
>>>>>>> will decrease a lot of my work :-)
>>>>>>>>> Bests,
>>>>>>>>> Mingda
>>>>>>>>>
>>>>>>>>> On Mon, Oct 10, 2016 at 8:59 PM, Mike Carey <dtabass@gmail.com>
>>>>>>>>>
>>>>>>>> wrote:
>>>>>>> Is there a reason for wanting two jobs?  I would think that one
>>>>>>>>> multiple
>>>>>>> stage job would be preferable.
>>>>>>>>>> On Oct 10, 2016 1:21 PM, "mingda li" <limingda1993@gmail.com>
>> wrote:
>>>>>>>>>> Oh, thanks Kim~
>>>>>>>>>>> On Mon, Oct 10, 2016 at 12:55 PM, Taewoo Kim
<
>> wangsaeu@gmail.com>
>>>>>>>>>> wrote:
>>>>>>> Forwarded to dev.
>>>>>>>>>>>> Best,
>>>>>>>>>>>> Taewoo
>>>>>>>>>>>>
>>>>>>>>>>>> ---------- Forwarded message ----------
>>>>>>>>>>>> From: mingda li <limingda1993@gmail.com>
>>>>>>>>>>>> Date: Mon, Oct 10, 2016 at 11:21 AM
>>>>>>>>>>>> Subject: Let one Operator finished the job
before another one
>>>>>>>>>>>>
>>>>>>>>>>> begin in
>>>>>>> Hyracks
>>>>>>>>>>>> To: users@asterixdb.apache.org
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Hi,
>>>>>>>>>>>>
>>>>>>>>>>>> Now,I am trying to build a Bloom Filter(BF)
before join. The
>> BF is
>>>>>>>>>>> build
>>>>>>>>>>> in
>>>>>>>>>>>
>>>>>>>>>>>> each node and sent to one node to combine.
I want to set a stop
>>>>>>>>>>>>
>>>>>>>>>>> sign
>>>>>>> there
>>>>>>>>>>>> before sending the BF in each node. The stop
sign means it can
>> only
>>>>>>>>>>> send
>>>>>>>>>>> the BF after it is build.
>>>>>>>>>>>> The class HyracksConnection.waitForCompletion
may help this.
>> But
>>>>>>>>>>> I am
>>>>>>> not
>>>>>>>>>>>> sure how to use it.
>>>>>>>>>>>> Should I build two jobs: hcc.waitForCompletion(jobBuildBF);
>>>>>>>>>>>> jobidSendBF=hcc.startJob(); ?
>>>>>>>>>>>> Has anyone ever used the HyracksConnection.waitForCompletion?
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Mingda
>>>>>>>>>>>>
>>>>>>>>>>>>
>>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message