asterixdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Preston Carman <prest...@apache.org>
Subject Re: Re: Let one Operator finished the job before another one begin in Hyracks
Date Wed, 12 Oct 2016 22:01:24 GMT
We have a similar process that is being prepared for determining the
range before sorting. We have a introduced a new operator to wait
until the range is determined before allowing the job to continue.

Lets set up a time to talk about the details.

On Tue, Oct 11, 2016 at 7:21 PM, 李文海 <lwh@whu.edu.cn> wrote:
>
>
>
>> -----原始邮件-----
>> 发件人: "Yingyi Bu" <buyingyi@gmail.com>
>> 发送时间: 2016年10月12日 星期三
>> 收件人: dev@asterixdb.apache.org
>> 抄送:
>> 主题: Re: Let one Operator finished the job before another one begin in Hyracks
>>
>> +1!
>>
>> Best,
>> Yingyi
>>
>> On Tue, Oct 11, 2016 at 9:32 AM, Mike Carey <dtabass@gmail.com> wrote:
>>
>> > BUT AGAIN:  I think the preferred solution in this case is to do it in one
>> > job.  Mingda, I would suggest sync'ing up with Wenhai for a Skype meeting
>> > on how he/Preston have done essentially the very same thing in their use
>> > cases for parallel sorts and interval joins.  Hyracks has everything needed
>> > for this, as it turns out, without a multi-job need.
>> >
>> >
>> >
>> > On 10/11/16 9:26 AM, Yingyi Bu wrote:
>> >
>> >> You can search the usage of waitForCompletion in the code base, e.g.:
>> >>
>> >> APIFramework.java:
>> >>
>> >> public void executeJobArray(IHyracksClientConnection hcc,
>> >> JobSpecification[] specs, PrintWriter out)
>> >>          throws Exception {
>> >>      for (JobSpecification spec : specs) {
>> >>          spec.setMaxReattempts(0);
>> >>          JobId jobId = hcc.startJob(spec);
>> >>          long startTime = System.currentTimeMillis();
>> >>          hcc.waitForCompletion(jobId);
>> >>          long endTime = System.currentTimeMillis();
>> >>          double duration = (endTime - startTime) / 1000.00;
>> >>          out.println("<pre>Duration: " + duration + " sec</pre>");
>> >>      }
>> >>
>> >> }
>> >>
>> >>
>> >> You start a job and get the job Id, and then you can wait on the job id.
>> >>
>> >>
>> >> Best,
>> >>
>> >> Yingyi
>> >>
>> >>
>> >> On Tue, Oct 11, 2016 at 1:45 AM, 李文海 <lwh@whu.edu.cn> wrote:
>> >>
>> >> Hi, Mingda.
>> >>>      What you need is quite familiar with what I and Presten have done.
>> >>> Actually, I think we just need a shared
>> >>> object accommodated by joblet or task which should be also driven by
a
>> >>> broadcast connector inbetween its input
>> >>> and output operators. We can talk about this by skype if needed.
>> >>> Best, Wenhai
>> >>>
>> >>>
>> >>> -----原始邮件-----
>> >>>> 发件人: "Mike Carey" <dtabass@gmail.com>
>> >>>> 发送时间: 2016年10月11日 星期二
>> >>>> 收件人: dev@asterixdb.apache.org
>> >>>> 抄送:
>> >>>> 主题: Re: Let one Operator finished the job before another one
begin in
>> >>>>
>> >>> Hyracks
>> >>>
>> >>>> And both Wenhai and Preston have examples of doing the
>> >>>> fan-in-and-compute/fan-back-out pattern with blocking until the
latter
>> >>>> part is done - Wenhai for finding range split points for parallel
>> >>>> sorting and Preston for similar things that arise in interval joins.
>> >>>> Can you guys chime in when you have a chance?  (Preston may be busy
from
>> >>>> what I saw on Skype on Friday :-), with congrats being due!)
>> >>>>
>> >>>>
>> >>>> On 10/11/16 12:22 AM, Jianfeng Jia wrote:
>> >>>>
>> >>>>> Based on the described example, it seems possible to implement
it in
>> >>>>>
>> >>>> one job by using MToNPartitioningConnectorDescriptor.
>> >>>
>> >>>> You can force that merge-BF-operator only runs in one partition
by
>> >>>>>
>> >>>> using PartitionConstraintHelper.addAbsoluteLocationConstraint()
>> >>> function.
>> >>>
>> >>>> On Oct 10, 2016, at 11:43 PM, mingda li <limingda1993@gmail.com>
>> >>>>>>
>> >>>>> wrote:
>> >>>
>> >>>> Yeah, that will be easier. But for example, we have N nodes and
in
>> >>>>>>
>> >>>>> each
>> >>>
>> >>>> node, it will generate a Bloom Filter(BF) for its own data. We need
>> >>>>>>
>> >>>>> to send
>> >>>
>> >>>> these BFs to one node for constructing a complete BF and then send
>> >>>>>>
>> >>>>> the BF
>> >>>
>> >>>> back to each node. I am not sure we can use multiple stage job for
>> >>>>>>
>> >>>>> this,
>> >>>
>> >>>> because there should be a 1->N and a N->1 connecter among
nodes. If
>> >>>>>>
>> >>>>> in one
>> >>>
>> >>>> job, there may be no way to transfer data among nodes.
>> >>>>>> This is my idea. If this can be implemented by one multiple
stage
>> >>>>>>
>> >>>>> job, that
>> >>>
>> >>>> will decrease a lot of my work :-)
>> >>>>>>
>> >>>>>> Bests,
>> >>>>>> Mingda
>> >>>>>>
>> >>>>>> On Mon, Oct 10, 2016 at 8:59 PM, Mike Carey <dtabass@gmail.com>
>> >>>>>>
>> >>>>> wrote:
>> >>>
>> >>>> Is there a reason for wanting two jobs?  I would think that one
>> >>>>>>>
>> >>>>>> multiple
>> >>>
>> >>>> stage job would be preferable.
>> >>>>>>>
>> >>>>>>> On Oct 10, 2016 1:21 PM, "mingda li" <limingda1993@gmail.com>
wrote:
>> >>>>>>>
>> >>>>>>> Oh, thanks Kim~
>> >>>>>>>>
>> >>>>>>>> On Mon, Oct 10, 2016 at 12:55 PM, Taewoo Kim <wangsaeu@gmail.com>
>> >>>>>>>>
>> >>>>>>> wrote:
>> >>>
>> >>>> Forwarded to dev.
>> >>>>>>>>>
>> >>>>>>>>> Best,
>> >>>>>>>>> Taewoo
>> >>>>>>>>>
>> >>>>>>>>> ---------- Forwarded message ----------
>> >>>>>>>>> From: mingda li <limingda1993@gmail.com>
>> >>>>>>>>> Date: Mon, Oct 10, 2016 at 11:21 AM
>> >>>>>>>>> Subject: Let one Operator finished the job before
another one
>> >>>>>>>>>
>> >>>>>>>> begin in
>> >>>
>> >>>> Hyracks
>> >>>>>>>>> To: users@asterixdb.apache.org
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>> Hi,
>> >>>>>>>>>
>> >>>>>>>>> Now,I am trying to build a Bloom Filter(BF)
before join. The BF is
>> >>>>>>>>>
>> >>>>>>>> build
>> >>>>>>>
>> >>>>>>>> in
>> >>>>>>>>
>> >>>>>>>>> each node and sent to one node to combine. I
want to set a stop
>> >>>>>>>>>
>> >>>>>>>> sign
>> >>>
>> >>>> there
>> >>>>>>>>
>> >>>>>>>>> before sending the BF in each node. The stop
sign means it can only
>> >>>>>>>>>
>> >>>>>>>> send
>> >>>>>>>
>> >>>>>>>> the BF after it is build.
>> >>>>>>>>> The class HyracksConnection.waitForCompletion
may help this. But
>> >>>>>>>>>
>> >>>>>>>> I am
>> >>>
>> >>>> not
>> >>>>>>>>
>> >>>>>>>>> sure how to use it.
>> >>>>>>>>> Should I build two jobs: hcc.waitForCompletion(jobBuildBF);
>> >>>>>>>>> jobidSendBF=hcc.startJob(); ?
>> >>>>>>>>> Has anyone ever used the HyracksConnection.waitForCompletion?
>> >>>>>>>>>
>> >>>>>>>>> Thanks,
>> >>>>>>>>> Mingda
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>
>> >
>

Mime
View raw message