asterixdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From 李文海 <...@whu.edu.cn>
Subject Re: Re: Let one Operator finished the job before another one begin in Hyracks
Date Wed, 12 Oct 2016 02:21:50 GMT



> -----原始邮件-----
> 发件人: "Yingyi Bu" <buyingyi@gmail.com>
> 发送时间: 2016年10月12日 星期三
> 收件人: dev@asterixdb.apache.org
> 抄送: 
> 主题: Re: Let one Operator finished the job before another one begin in Hyracks
> 
> +1!
> 
> Best,
> Yingyi
> 
> On Tue, Oct 11, 2016 at 9:32 AM, Mike Carey <dtabass@gmail.com> wrote:
> 
> > BUT AGAIN:  I think the preferred solution in this case is to do it in one
> > job.  Mingda, I would suggest sync'ing up with Wenhai for a Skype meeting
> > on how he/Preston have done essentially the very same thing in their use
> > cases for parallel sorts and interval joins.  Hyracks has everything needed
> > for this, as it turns out, without a multi-job need.
> >
> >
> >
> > On 10/11/16 9:26 AM, Yingyi Bu wrote:
> >
> >> You can search the usage of waitForCompletion in the code base, e.g.:
> >>
> >> APIFramework.java:
> >>
> >> public void executeJobArray(IHyracksClientConnection hcc,
> >> JobSpecification[] specs, PrintWriter out)
> >>          throws Exception {
> >>      for (JobSpecification spec : specs) {
> >>          spec.setMaxReattempts(0);
> >>          JobId jobId = hcc.startJob(spec);
> >>          long startTime = System.currentTimeMillis();
> >>          hcc.waitForCompletion(jobId);
> >>          long endTime = System.currentTimeMillis();
> >>          double duration = (endTime - startTime) / 1000.00;
> >>          out.println("<pre>Duration: " + duration + " sec</pre>");
> >>      }
> >>
> >> }
> >>
> >>
> >> You start a job and get the job Id, and then you can wait on the job id.
> >>
> >>
> >> Best,
> >>
> >> Yingyi
> >>
> >>
> >> On Tue, Oct 11, 2016 at 1:45 AM, 李文海 <lwh@whu.edu.cn> wrote:
> >>
> >> Hi, Mingda.
> >>>      What you need is quite familiar with what I and Presten have done.
> >>> Actually, I think we just need a shared
> >>> object accommodated by joblet or task which should be also driven by a
> >>> broadcast connector inbetween its input
> >>> and output operators. We can talk about this by skype if needed.
> >>> Best, Wenhai
> >>>
> >>>
> >>> -----原始邮件-----
> >>>> 发件人: "Mike Carey" <dtabass@gmail.com>
> >>>> 发送时间: 2016年10月11日 星期二
> >>>> 收件人: dev@asterixdb.apache.org
> >>>> 抄送:
> >>>> 主题: Re: Let one Operator finished the job before another one begin
in
> >>>>
> >>> Hyracks
> >>>
> >>>> And both Wenhai and Preston have examples of doing the
> >>>> fan-in-and-compute/fan-back-out pattern with blocking until the latter
> >>>> part is done - Wenhai for finding range split points for parallel
> >>>> sorting and Preston for similar things that arise in interval joins.
> >>>> Can you guys chime in when you have a chance?  (Preston may be busy
from
> >>>> what I saw on Skype on Friday :-), with congrats being due!)
> >>>>
> >>>>
> >>>> On 10/11/16 12:22 AM, Jianfeng Jia wrote:
> >>>>
> >>>>> Based on the described example, it seems possible to implement it
in
> >>>>>
> >>>> one job by using MToNPartitioningConnectorDescriptor.
> >>>
> >>>> You can force that merge-BF-operator only runs in one partition by
> >>>>>
> >>>> using PartitionConstraintHelper.addAbsoluteLocationConstraint()
> >>> function.
> >>>
> >>>> On Oct 10, 2016, at 11:43 PM, mingda li <limingda1993@gmail.com>
> >>>>>>
> >>>>> wrote:
> >>>
> >>>> Yeah, that will be easier. But for example, we have N nodes and in
> >>>>>>
> >>>>> each
> >>>
> >>>> node, it will generate a Bloom Filter(BF) for its own data. We need
> >>>>>>
> >>>>> to send
> >>>
> >>>> these BFs to one node for constructing a complete BF and then send
> >>>>>>
> >>>>> the BF
> >>>
> >>>> back to each node. I am not sure we can use multiple stage job for
> >>>>>>
> >>>>> this,
> >>>
> >>>> because there should be a 1->N and a N->1 connecter among nodes.
If
> >>>>>>
> >>>>> in one
> >>>
> >>>> job, there may be no way to transfer data among nodes.
> >>>>>> This is my idea. If this can be implemented by one multiple
stage
> >>>>>>
> >>>>> job, that
> >>>
> >>>> will decrease a lot of my work :-)
> >>>>>>
> >>>>>> Bests,
> >>>>>> Mingda
> >>>>>>
> >>>>>> On Mon, Oct 10, 2016 at 8:59 PM, Mike Carey <dtabass@gmail.com>
> >>>>>>
> >>>>> wrote:
> >>>
> >>>> Is there a reason for wanting two jobs?  I would think that one
> >>>>>>>
> >>>>>> multiple
> >>>
> >>>> stage job would be preferable.
> >>>>>>>
> >>>>>>> On Oct 10, 2016 1:21 PM, "mingda li" <limingda1993@gmail.com>
wrote:
> >>>>>>>
> >>>>>>> Oh, thanks Kim~
> >>>>>>>>
> >>>>>>>> On Mon, Oct 10, 2016 at 12:55 PM, Taewoo Kim <wangsaeu@gmail.com>
> >>>>>>>>
> >>>>>>> wrote:
> >>>
> >>>> Forwarded to dev.
> >>>>>>>>>
> >>>>>>>>> Best,
> >>>>>>>>> Taewoo
> >>>>>>>>>
> >>>>>>>>> ---------- Forwarded message ----------
> >>>>>>>>> From: mingda li <limingda1993@gmail.com>
> >>>>>>>>> Date: Mon, Oct 10, 2016 at 11:21 AM
> >>>>>>>>> Subject: Let one Operator finished the job before
another one
> >>>>>>>>>
> >>>>>>>> begin in
> >>>
> >>>> Hyracks
> >>>>>>>>> To: users@asterixdb.apache.org
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Hi,
> >>>>>>>>>
> >>>>>>>>> Now,I am trying to build a Bloom Filter(BF) before
join. The BF is
> >>>>>>>>>
> >>>>>>>> build
> >>>>>>>
> >>>>>>>> in
> >>>>>>>>
> >>>>>>>>> each node and sent to one node to combine. I want
to set a stop
> >>>>>>>>>
> >>>>>>>> sign
> >>>
> >>>> there
> >>>>>>>>
> >>>>>>>>> before sending the BF in each node. The stop sign
means it can only
> >>>>>>>>>
> >>>>>>>> send
> >>>>>>>
> >>>>>>>> the BF after it is build.
> >>>>>>>>> The class HyracksConnection.waitForCompletion may
help this. But
> >>>>>>>>>
> >>>>>>>> I am
> >>>
> >>>> not
> >>>>>>>>
> >>>>>>>>> sure how to use it.
> >>>>>>>>> Should I build two jobs: hcc.waitForCompletion(jobBuildBF);
> >>>>>>>>> jobidSendBF=hcc.startJob(); ?
> >>>>>>>>> Has anyone ever used the HyracksConnection.waitForCompletion?
> >>>>>>>>>
> >>>>>>>>> Thanks,
> >>>>>>>>> Mingda
> >>>>>>>>>
> >>>>>>>>>
> >>>
> >


Mime
View raw message