hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Siddharth Seth <ss...@apache.org>
Subject Re: Review Request 51312: HIVE-14589 add consistent node replacement to LLAP for splits
Date Tue, 30 Aug 2016 22:35:56 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51312/#review147362
-----------------------------------------------------------



Possible to add a test (or a few)? (I don' see one in the patch attached to the jira).

Question: How long does it take for a persistent node to go away - in case of a JVM crash
/ node crash. Is it possible that a new process starts up within the duration it takes for
the old node-slot entry to be removed? (It gets added to the end as a result)

In terms of a permanent cluster size reduction - I think we should at least take some steps
to handle this scenario, or an equivalent scenario where it takes a long time (minutes) for
a node to come back. We have a force locality scheduling mode, this will effectively cause
new queries to start in a state where they cannot complete. Check for this and fail the query?,
or fallback to a next node for the split. The fallback to the next node is something that
is going to come in soon anyway, to control locality if nodes are not available (current is
always random as against 'random' being a configurable option :) )


llap-client/src/java/org/apache/hadoop/hive/llap/registry/impl/LlapZookeeperRegistryImpl.java
(line 104)
<https://reviews.apache.org/r/51312/#comment214497>

    Move this into a separate path altogether, instead of listing and filtering at the same
path each time?



llap-client/src/java/org/apache/hadoop/hive/llap/registry/impl/LlapZookeeperRegistryImpl.java
(line 110)
<https://reviews.apache.org/r/51312/#comment214498>

    woekersPath is no long defined, so the comment can be deleted.



llap-client/src/java/org/apache/hadoop/hive/llap/registry/impl/LlapZookeeperRegistryImpl.java
(line 533)
<https://reviews.apache.org/r/51312/#comment214487>

    There's a lot of similar code to read records within a loop, and a method to read records.
Move into a single method? Can be done in a separate jiras as well.



llap-client/src/java/org/apache/hadoop/hive/llap/registry/impl/LlapZookeeperRegistryImpl.java
(line 547)
<https://reviews.apache.org/r/51312/#comment214501>

    The size of this collection is used to determine the #knownWorkers in HostAffinitySplitLocationProvider.
The size at the moment represents active nodes.
    
    I don't think the intent is to return a fewer number of nodes than what existed if a node
goes down? 
    1) Return n entries, and one entry would indicate it's not active (and this will need
to be acted upon by all clients
    2) Add a getNumInstances itnerface, which can differ from actual nodes available, along
with a getNodeAtLocation interface?



llap-client/src/java/org/apache/hadoop/hive/llap/registry/impl/LlapZookeeperRegistryImpl.java
(line 554)
<https://reviews.apache.org/r/51312/#comment214496>

    Here, as well as other places, a node is only available when it's slot has been registered.
Otherwise it should not be visible to clients.



llap-client/src/java/org/apache/hadoop/hive/llap/registry/impl/LlapZookeeperRegistryImpl.java
(line 577)
<https://reviews.apache.org/r/51312/#comment214495>

    Don't send back a node until it's slot registration has completed. We don't know what
position it will take. This logic puts such nodes at the end.



llap-client/src/java/org/apache/hadoop/hive/llap/registry/impl/SlotZnode.java (line 51)
<https://reviews.apache.org/r/51312/#comment214489>

    Are there tests in curator for the said node, which we could borrow parts from?



llap-client/src/java/org/apache/hadoop/hive/llap/registry/impl/SlotZnode.java (line 128)
<https://reviews.apache.org/r/51312/#comment214472>

    Nit: else {
      slots.add( ..)
    }
    
    Makes it a little more readable, and maintanable for future changes.



llap-client/src/java/org/apache/hadoop/hive/llap/registry/impl/SlotZnode.java (line 138)
<https://reviews.apache.org/r/51312/#comment214477>

    Potential divide by 0? (new cluster)
    
    What's the purpose of the 2.0f/approxWorkerCount ?



ql/src/test/org/apache/hadoop/hive/ql/exec/tez/TestHostAffinitySplitLocationProvider.java
(line 94)
<https://reviews.apache.org/r/51312/#comment214490>

    This isn't part of the patch uploaded to jira?


- Siddharth Seth


On Aug. 23, 2016, 1:41 a.m., Sergey Shelukhin wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51312/
> -----------------------------------------------------------
> 
> (Updated Aug. 23, 2016, 1:41 a.m.)
> 
> 
> Review request for hive and Prasanth_J.
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> see jira
> 
> 
> Diffs
> -----
> 
>   llap-client/src/java/org/apache/hadoop/hive/llap/registry/ServiceInstanceSet.java 99ead9b

>   llap-client/src/java/org/apache/hadoop/hive/llap/registry/impl/LlapFixedRegistryImpl.java
e9456f2 
>   llap-client/src/java/org/apache/hadoop/hive/llap/registry/impl/LlapZookeeperRegistryImpl.java
64d2617 
>   llap-client/src/java/org/apache/hadoop/hive/llap/registry/impl/SlotZnode.java PRE-CREATION

>   llap-client/src/java/org/apache/hadoop/hive/llap/security/LlapTokenClient.java 921e050

>   llap-server/src/java/org/apache/hadoop/hive/llap/cli/LlapStatusServiceDriver.java 17ce69b

>   llap-tez/src/java/org/apache/hadoop/hive/llap/tezplugins/LlapTaskSchedulerService.java
efd774d 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/HostAffinitySplitLocationProvider.java
c06499e 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/Utils.java 8a4fc08 
>   ql/src/test/org/apache/hadoop/hive/ql/exec/tez/TestHostAffinitySplitLocationProvider.java
d98a5ff 
> 
> Diff: https://reviews.apache.org/r/51312/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Sergey Shelukhin
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message