asterixdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From mingda li <limingda1...@gmail.com>
Subject Re: Let one Operator finished the job before another one begin in Hyracks
Date Tue, 11 Oct 2016 22:44:23 GMT
Oh, I see. Thanks :-)



On Tue, Oct 11, 2016 at 11:50 AM, Mike Carey <dtabass@gmail.com> wrote:

> I think there will also be some helpful design documents that he and
> Preston can both share (with pics of the Hyracks jobs and the
> activities/stages involved).
>
>
>
>
> On 10/11/16 10:52 AM, mingda li wrote:
>
>> Oh, thanks for all the explanation:-)
>> I will talk with Wenhai about how they implement such function and try to
>> finish this in one job.
>>
>> Bests,
>> Mingda
>>
>> On Tue, Oct 11, 2016 at 9:52 AM, 李文海 <lwh@whu.edu.cn> wrote:
>>
>>
>>>
>>> -----原始邮件-----
>>>> 发件人: "Yingyi Bu" <buyingyi@gmail.com>
>>>> 发送时间: 2016年10月12日 星期三
>>>> 收件人: dev@asterixdb.apache.org
>>>> 抄送:
>>>> 主题: Re: Let one Operator finished the job before another one begin in
>>>>
>>> Hyracks
>>>
>>>> +1!
>>>>
>>>> Best,
>>>> Yingyi
>>>>
>>>> On Tue, Oct 11, 2016 at 9:32 AM, Mike Carey <dtabass@gmail.com> wrote:
>>>>
>>>> BUT AGAIN:  I think the preferred solution in this case is to do it in
>>>>>
>>>> one
>>>
>>>> job.  Mingda, I would suggest sync'ing up with Wenhai for a Skype
>>>>>
>>>> meeting
>>>
>>>> on how he/Preston have done essentially the very same thing in their
>>>>>
>>>> use
>>>
>>>> cases for parallel sorts and interval joins.  Hyracks has everything
>>>>>
>>>> needed
>>>
>>>> for this, as it turns out, without a multi-job need.
>>>>>
>>>>>
>>>>>
>>>>> On 10/11/16 9:26 AM, Yingyi Bu wrote:
>>>>>
>>>>> You can search the usage of waitForCompletion in the code base, e.g.:
>>>>>>
>>>>>> APIFramework.java:
>>>>>>
>>>>>> public void executeJobArray(IHyracksClientConnection hcc,
>>>>>> JobSpecification[] specs, PrintWriter out)
>>>>>>           throws Exception {
>>>>>>       for (JobSpecification spec : specs) {
>>>>>>           spec.setMaxReattempts(0);
>>>>>>           JobId jobId = hcc.startJob(spec);
>>>>>>           long startTime = System.currentTimeMillis();
>>>>>>           hcc.waitForCompletion(jobId);
>>>>>>           long endTime = System.currentTimeMillis();
>>>>>>           double duration = (endTime - startTime) / 1000.00;
>>>>>>           out.println("<pre>Duration: " + duration + " sec</pre>");
>>>>>>       }
>>>>>>
>>>>>> }
>>>>>>
>>>>>>
>>>>>> You start a job and get the job Id, and then you can wait on the
job
>>>>>>
>>>>> id.
>>>
>>>>
>>>>>> Best,
>>>>>>
>>>>>> Yingyi
>>>>>>
>>>>>>
>>>>>> On Tue, Oct 11, 2016 at 1:45 AM, 李文海 <lwh@whu.edu.cn>
wrote:
>>>>>>
>>>>>> Hi, Mingda.
>>>>>>
>>>>>>>       What you need is quite familiar with what I and Presten
have
>>>>>>>
>>>>>> done.
>>>
>>>> Actually, I think we just need a shared
>>>>>>> object accommodated by joblet or task which should be also driven
by
>>>>>>>
>>>>>> a
>>>
>>>> broadcast connector inbetween its input
>>>>>>> and output operators. We can talk about this by skype if needed.
>>>>>>> Best, Wenhai
>>>>>>>
>>>>>>>
>>>>>>> -----原始邮件-----
>>>>>>>
>>>>>>>> 发件人: "Mike Carey" <dtabass@gmail.com>
>>>>>>>> 发送时间: 2016年10月11日 星期二
>>>>>>>> 收件人: dev@asterixdb.apache.org
>>>>>>>> 抄送:
>>>>>>>> 主题: Re: Let one Operator finished the job before another
one begin
>>>>>>>>
>>>>>>> in
>>>
>>>> Hyracks
>>>>>>>
>>>>>>> And both Wenhai and Preston have examples of doing the
>>>>>>>> fan-in-and-compute/fan-back-out pattern with blocking until
the
>>>>>>>>
>>>>>>> latter
>>>
>>>> part is done - Wenhai for finding range split points for parallel
>>>>>>>> sorting and Preston for similar things that arise in interval
joins.
>>>>>>>> Can you guys chime in when you have a chance?  (Preston may
be busy
>>>>>>>>
>>>>>>> from
>>>
>>>> what I saw on Skype on Friday :-), with congrats being due!)
>>>>>>>>
>>>>>>>>
>>>>>>>> On 10/11/16 12:22 AM, Jianfeng Jia wrote:
>>>>>>>>
>>>>>>>> Based on the described example, it seems possible to implement
it
>>>>>>>>>
>>>>>>>> in
>>>
>>>> one job by using MToNPartitioningConnectorDescriptor.
>>>>>>>> You can force that merge-BF-operator only runs in one partition
by
>>>>>>>> using PartitionConstraintHelper.addAbsoluteLocationConstraint()
>>>>>>>>
>>>>>>> function.
>>>>>>>
>>>>>>> On Oct 10, 2016, at 11:43 PM, mingda li <limingda1993@gmail.com>
>>>>>>>>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>> Yeah, that will be easier. But for example, we have N nodes
and in
>>>>>>>>
>>>>>>>>> each
>>>>>>>>>
>>>>>>>> node, it will generate a Bloom Filter(BF) for its own data.
We need
>>>>>>>>
>>>>>>>>> to send
>>>>>>>>>
>>>>>>>> these BFs to one node for constructing a complete BF and
then send
>>>>>>>>
>>>>>>>>> the BF
>>>>>>>>>
>>>>>>>> back to each node. I am not sure we can use multiple stage
job for
>>>>>>>>
>>>>>>>>> this,
>>>>>>>>>
>>>>>>>> because there should be a 1->N and a N->1 connecter
among nodes. If
>>>>>>>>
>>>>>>>>> in one
>>>>>>>>>
>>>>>>>> job, there may be no way to transfer data among nodes.
>>>>>>>>
>>>>>>>>> This is my idea. If this can be implemented by one multiple
stage
>>>>>>>>>>
>>>>>>>>>> job, that
>>>>>>>>>
>>>>>>>> will decrease a lot of my work :-)
>>>>>>>>
>>>>>>>>> Bests,
>>>>>>>>>> Mingda
>>>>>>>>>>
>>>>>>>>>> On Mon, Oct 10, 2016 at 8:59 PM, Mike Carey <dtabass@gmail.com>
>>>>>>>>>>
>>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>> Is there a reason for wanting two jobs?  I would think that
one
>>>>>>>>
>>>>>>>>> multiple
>>>>>>>>>>
>>>>>>>>> stage job would be preferable.
>>>>>>>>
>>>>>>>>> On Oct 10, 2016 1:21 PM, "mingda li" <limingda1993@gmail.com>
>>>>>>>>>>>
>>>>>>>>>> wrote:
>>>
>>>> Oh, thanks Kim~
>>>>>>>>>>>
>>>>>>>>>>>> On Mon, Oct 10, 2016 at 12:55 PM, Taewoo
Kim <
>>>>>>>>>>>>
>>>>>>>>>>> wangsaeu@gmail.com>
>>>
>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>> Forwarded to dev.
>>>>>>>>
>>>>>>>>> Best,
>>>>>>>>>>>>> Taewoo
>>>>>>>>>>>>>
>>>>>>>>>>>>> ---------- Forwarded message ----------
>>>>>>>>>>>>> From: mingda li <limingda1993@gmail.com>
>>>>>>>>>>>>> Date: Mon, Oct 10, 2016 at 11:21 AM
>>>>>>>>>>>>> Subject: Let one Operator finished the
job before another one
>>>>>>>>>>>>>
>>>>>>>>>>>>> begin in
>>>>>>>>>>>>
>>>>>>>>>>> Hyracks
>>>>>>>>
>>>>>>>>> To: users@asterixdb.apache.org
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Now,I am trying to build a Bloom Filter(BF)
before join. The
>>>>>>>>>>>>>
>>>>>>>>>>>> BF is
>>>
>>>> build
>>>>>>>>>>>> in
>>>>>>>>>>>>
>>>>>>>>>>>> each node and sent to one node to combine.
I want to set a stop
>>>>>>>>>>>>>
>>>>>>>>>>>>> sign
>>>>>>>>>>>>
>>>>>>>>>>> there
>>>>>>>>
>>>>>>>>> before sending the BF in each node. The stop sign means
it can
>>>>>>>>>>>>>
>>>>>>>>>>>> only
>>>
>>>> send
>>>>>>>>>>>> the BF after it is build.
>>>>>>>>>>>>
>>>>>>>>>>>>> The class HyracksConnection.waitForCompletion
may help this.
>>>>>>>>>>>>>
>>>>>>>>>>>> But
>>>
>>>> I am
>>>>>>>>>>>>
>>>>>>>>>>> not
>>>>>>>>
>>>>>>>>> sure how to use it.
>>>>>>>>>>>>> Should I build two jobs: hcc.waitForCompletion(jobBuildBF);
>>>>>>>>>>>>> jobidSendBF=hcc.startJob(); ?
>>>>>>>>>>>>> Has anyone ever used the HyracksConnection.waitForCompletion?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> Mingda
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message