hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Siddharth Seth <ss...@apache.org>
Subject Re: Review Request 51312: HIVE-14589 add consistent node replacement to LLAP for splits
Date Thu, 01 Sep 2016 19:24:45 GMT


> On Aug. 30, 2016, 10:35 p.m., Siddharth Seth wrote:
> > llap-client/src/java/org/apache/hadoop/hive/llap/registry/impl/LlapZookeeperRegistryImpl.java,
line 554
> > <https://reviews.apache.org/r/51312/diff/1/?file=1481607#file1481607line554>
> >
> >     The size of this collection is used to determine the #knownWorkers in HostAffinitySplitLocationProvider.
The size at the moment represents active nodes.
> >     
> >     I don't think the intent is to return a fewer number of nodes than what existed
if a node goes down? 
> >     1) Return n entries, and one entry would indicate it's not active (and this
will need to be acted upon by all clients
> >     2) Add a getNumInstances itnerface, which can differ from actual nodes available,
along with a getNodeAtLocation interface?
> 
> Sergey Shelukhin wrote:
>     HIVE-14680
> 
> Siddharth Seth wrote:
>     Think this is very incomplete without this change. I don't think it's any worse than
it is today though.
> 
> Sergey Shelukhin wrote:
>     Why? It handles consistency in most scenarios as described. There's just a small
window during the crash where splits will not get good locations.

The 'small window' is assuming the node comes back up in the fisrt place. Minutes is enough
to cause damage to the cache on multiple nodes. What happens on a cluster where a physical
node is taken out for maintenance, and there's no capacity to launch another node. What happens
in case of an hour long maintenence window for a node/set of nodes.

Will the consisten caching part take care of this ? (The target bucket size will be decreasing).

The bucket size could be based on the number of instances requested by the slider app.


> On Aug. 30, 2016, 10:35 p.m., Siddharth Seth wrote:
> > llap-client/src/java/org/apache/hadoop/hive/llap/registry/impl/LlapZookeeperRegistryImpl.java,
line 561
> > <https://reviews.apache.org/r/51312/diff/1/?file=1481607#file1481607line561>
> >
> >     Here, as well as other places, a node is only available when it's slot has been
registered. Otherwise it should not be visible to clients.
> 
> Sergey Shelukhin wrote:
>     I think it only makes sense for ordered, to not mess up the order. Changed that.
It's also problematic because we'd have to keep track of coordinated notifications.
> 
> Siddharth Seth wrote:
>     Assuming notifications for registered workers behave in the following manner.
>     
>     NewNode - only when the slot has been registered.
>     Node down - when either the slot or the actual node goes away.
>     State changed - I'm not sure anyone is expected to act on this (what are the possible
notifications here).
> 
> Sergey Shelukhin wrote:
>     Why? It doesn't seem to provide a lot of benefit compared to existing worker node
handling, in presence of crashes/etc.

It does provide consistency, and makes it easier for someone who does not know this specific
detail about the two ZK nodes per instance to reason about potential problems. Eseentially
easier to undestand and maintain.


- Siddharth


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51312/#review147362
-----------------------------------------------------------


On Sept. 1, 2016, 6:35 p.m., Sergey Shelukhin wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51312/
> -----------------------------------------------------------
> 
> (Updated Sept. 1, 2016, 6:35 p.m.)
> 
> 
> Review request for hive and Prasanth_J.
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> see jira
> 
> 
> Diffs
> -----
> 
>   llap-client/pom.xml 0243340 
>   llap-client/src/java/org/apache/hadoop/hive/llap/registry/ServiceInstanceSet.java 99ead9b

>   llap-client/src/java/org/apache/hadoop/hive/llap/registry/impl/LlapFixedRegistryImpl.java
e9456f2 
>   llap-client/src/java/org/apache/hadoop/hive/llap/registry/impl/LlapZookeeperRegistryImpl.java
64d2617 
>   llap-client/src/java/org/apache/hadoop/hive/llap/registry/impl/SlotZnode.java PRE-CREATION

>   llap-client/src/java/org/apache/hadoop/hive/llap/security/LlapTokenClient.java 921e050

>   llap-client/src/test/org/apache/hadoop/hive/llap/registry/impl/TestSlotZnode.java PRE-CREATION

>   llap-server/src/java/org/apache/hadoop/hive/llap/cli/LlapStatusServiceDriver.java 17ce69b

>   llap-tez/src/java/org/apache/hadoop/hive/llap/tezplugins/LlapTaskSchedulerService.java
efd774d 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/Utils.java 8a4fc08 
>   ql/src/test/org/apache/hadoop/hive/ql/exec/tez/TestHostAffinitySplitLocationProvider.java
54f7363 
> 
> Diff: https://reviews.apache.org/r/51312/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Sergey Shelukhin
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message