asterixdb-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xikui Wang (Code Review)" <do-not-re...@asterixdb.incubator.apache.org>
Subject Change in asterixdb[master]: [ASTERIXDB-2152][FUN][COMP] Enable specifying computation lo...
Date Tue, 14 Nov 2017 00:13:37 GMT
Xikui Wang has posted comments on this change.

Change subject: [ASTERIXDB-2152][FUN][COMP] Enable specifying computation location
......................................................................


Patch Set 12:

(2 comments)

Added two comments. One of them obviously exceeded the reviewer friendly comment size limit.
Sorry about that. :)

https://asterix-gerrit.ics.uci.edu/#/c/2114/12/asterixdb/asterix-common/src/main/resources/asx_errormsg/en.properties
File asterixdb/asterix-common/src/main/resources/asx_errormsg/en.properties:

PS12, Line 121: Invalid computation location
> Yes, but it might be nice to report the invalid location if one is invalid.
Oh. I misunderstood your question. I thought you were asking about possibility here. Will
address this in next patch.


https://asterix-gerrit.ics.uci.edu/#/c/2114/12/hyracks-fullstack/algebricks/algebricks-core/src/main/java/org/apache/hyracks/algebricks/core/algebra/operators/physical/AssignPOperator.java
File hyracks-fullstack/algebricks/algebricks-core/src/main/java/org/apache/hyracks/algebricks/core/algebra/operators/physical/AssignPOperator.java:

PS12, Line 118: setLocationConstraint
> But I'm wondering why a location constraint is always needed for an assign 
Alright. I spent some time investigating the constraints. Let me see if I can convince you.
:) 
Here we talk about the UDF in feed case only, as we don't do anything special for udf evaluation
for common queries currently.
1. The partition constraint here is slightly different than the locationConstraint in dataset
Ops which is tied to physical properties. The location constraint here depends on the computation
locations (i.e., partitions) and it's decided dynamically during the query compilation. The
user specified parallelism level, which is similar to the countConstraint, is translated to
locationConstraints with computation location assigned in a round robin fashion. 
2. We could also only assign count constraint and let hyracks decide which node to run at
runtime. However, for the current implementation, the node assignment is random which cannot
distribute the workload evenly. ps. there is a bug in the random assignments also, and I submitted
another patch for it.
3. One possibility is to do round robin in the node assignment for start task. However, hyracks
treats all tasks equally.We can't really do round robin for the udf evaluation tasks only.
In that sense, I guess assign location constraint here probably better.
4. Currently, the locationConstraint for assign is only set in the feed context. The feed
datasource obtains computation nodes list and we use that as the count constraint for udf
evaluation. My feeling is we have the full workload distribution information, but we ignore
the detailed answer and cross our fingers to hope hyracks give us an answer....
5. Further, if we have advanced load balance implemented in hyracks, this should go away for
sure. :)


-- 
To view, visit https://asterix-gerrit.ics.uci.edu/2114
To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id7eed5dac03c2f260507e16cf687162d65787bd1
Gerrit-PatchSet: 12
Gerrit-Project: asterixdb
Gerrit-Branch: master
Gerrit-Owner: Xikui Wang <xkkwww@gmail.com>
Gerrit-Reviewer: Anon. E. Moose #1000171
Gerrit-Reviewer: Jenkins <jenkins@fulliautomatix.ics.uci.edu>
Gerrit-Reviewer: Till Westmann <tillw@apache.org>
Gerrit-Reviewer: Xikui Wang <xkkwww@gmail.com>
Gerrit-HasComments: Yes

Mime
View raw message