samza-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jake Maes <jacob.m...@gmail.com>
Subject Re: Review Request 45144: SAMZA-906 Host Affinity - Minimize task reassignment when container count changes
Date Mon, 04 Apr 2016 21:15:50 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/45144/
-----------------------------------------------------------

(Updated April 4, 2016, 9:15 p.m.)


Review request for samza, Navina Ramesh, Jagadish Venkatraman, and Yi Pan (Data Infrastructure).


Changes
-------

rebasing to all the patches that have recently been committed


Repository: samza


Description
-------

Persist the task-to-container mapping in the coordinator stream and use it to minimize the
reassignment when the container count changes.

A new BalancingTaskNameGrouper interface exposes the balance() method, which can be implemented
pluggably. 
GroupByContainerCount has been rewritten in java and the balance() functionality added. This
is because the balance logic is specific to a grouper.

Detailed changes:
import-control.xml - Update imports that weren't needed in scala and add some for tests.
LocalityManager.java - Add TaskAssignment manager. This mostly just keeps the JobCoordinator
code cleaner, but also associates the 2 managers for Host Affinity info.
GroupByContainerCount - THE BIG ONE. Rewritten in Java and it now implements BalancingTaskNameGrouper
GroupByContainerCountFactory - Rewritten in Java
TestStorageRecovery - Old test depended on the order of the partitions in a container. Now
it doesn't.
TestGroupByContainerCount - Rewritten in Java and lots of tests added for balance()
BalancingTaskNameGrouper - New interface for the balance method. Exposes the new functionality
without breaking backward compatibility
TaskAssignmentManager - Coordinator stream manager for task-to-container mapping
TaskNameGroupBalancer - Bridges the task mapping (balance) capability with taskname groupers,
old and new
SetTaskContainerMapping - Coordinator strem message for the task-to-container mapping


Diffs (updated)
-----

  checkstyle/import-control.xml b5bd365991bdee2d0bdad82aaf5d37a0b52760a5 
  samza-core/src/main/java/org/apache/samza/container/LocalityManager.java acf93525ea5c97df187bbe7977e2ae9fea65b32b

  samza-core/src/main/java/org/apache/samza/container/grouper/task/BalancingTaskNameGrouper.java
PRE-CREATION 
  samza-core/src/main/java/org/apache/samza/container/grouper/task/GroupByContainerCount.java
PRE-CREATION 
  samza-core/src/main/java/org/apache/samza/container/grouper/task/TaskAssignmentManager.java
PRE-CREATION 
  samza-core/src/main/java/org/apache/samza/coordinator/stream/AbstractCoordinatorStreamManager.java
211b64215f26db49cd4411ff3fb41231145307d7 
  samza-core/src/main/java/org/apache/samza/coordinator/stream/messages/SetContainerHostMapping.java
4d093b500b7f3b582446634ced3e9d0b28371616 
  samza-core/src/main/java/org/apache/samza/coordinator/stream/messages/SetTaskContainerMapping.java
PRE-CREATION 
  samza-core/src/main/scala/org/apache/samza/container/grouper/task/GroupByContainerCount.scala
cb0a3bde15174c53f8eb3c0dbbb4f59dbf2589b1 
  samza-core/src/main/scala/org/apache/samza/container/grouper/task/GroupByContainerCountFactory.scala
8bbfd639cd9ea1d758d0daa45ce41093c1cb66f6 
  samza-core/src/main/scala/org/apache/samza/coordinator/JobCoordinator.scala cd7daa2f2be9e7f1f2fea8e3035c8f6c6f420e95

  samza-core/src/test/java/org/apache/samza/container/grouper/task/TestGroupByContainerCount.java
PRE-CREATION 
  samza-core/src/test/java/org/apache/samza/container/grouper/task/TestTaskAssignmentManager.java
PRE-CREATION 
  samza-core/src/test/java/org/apache/samza/container/mock/ContainerMocks.java PRE-CREATION

  samza-core/src/test/java/org/apache/samza/coordinator/stream/MockCoordinatorStreamSystemFactory.java
e0d4aa1016d61ce328d7ff74b58f7b8f7682f386 
  samza-core/src/test/java/org/apache/samza/coordinator/stream/MockCoordinatorStreamWrappedConsumer.java
429573b480112c7491303dc410d78f37a308c4a7 
  samza-core/src/test/java/org/apache/samza/storage/TestStorageRecovery.java 53207ad7e87fe491c6ae95ae6c590c6d5668d3dc

  samza-core/src/test/scala/org/apache/samza/container/grouper/task/TestGroupByContainerCount.scala
6e9c6fa579a5901000bea0601c771783d8334f0e 

Diff: https://reviews.apache.org/r/45144/diff/


Testing
-------

A bunch of new unit tests have been added. 

Also tested with a test job. The task mapping (initially missing) is added the first time
the job is run. It is then used as expected to reduce task reassignment as the container count
was adjusted from 4->3->5 on subsequent runs.


Thanks,

Jake Maes


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message