incubator-bigtop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stephen Chu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (BIGTOP-635) Implement a cluster-abstraction, discovery and manipulation framework for iTest
Date Tue, 24 Jul 2012 01:41:35 GMT

    [ https://issues.apache.org/jira/browse/BIGTOP-635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13421089#comment-13421089
] 

Stephen Chu commented on BIGTOP-635:
------------------------------------

Thanks, Sujay.
 
Some initial comments/questions:

{code}
+   <dependency>                                                                   
                                                    
+     <groupId>org.apache.bigtop.itest</groupId>                             
                                                          
+     <artifactId>itest-common</artifactId>                                  
                                                          
+     <version>0.2.0-incubating</version>                                    
                                                          
+   </dependency>
{code}

Should this be 0.5.0-incubating instead? Bigtop trunk test artifacts are using 0.5.0-incubating.

{code}
+     <dependency>                                                                 
                                                    
+         <groupId>org.apache.hadoop</groupId>                               
                                                          
+         <artifactId>hadoop-mapreduce-client-core</artifactId>              
                                                          
+         <version>2.0.0-alpha</version>                                     
                                                          
+       </dependency>                                                              
                                                    
+     <dependency>                                                                 
                                                    
+         <groupId>org.apache.hadoop</groupId>                               
                                                          
+         <artifactId>hadoop-common</artifactId>                             
                                                          
+         <version>2.0.0-alpha</version>                                     
                                                          
+     </dependency>                                                                
                                                    
+     <dependency>                                                                 
                                                    
+         <groupId>org.apache.hadoop</groupId>                               
                                                          
+         <artifactId>hadoop-common</artifactId>                             
                                                          
+         <version>2.0.0-alpha</version>                                     
                                                          
+         <type>test-jar</type>                                              
                                                          
+     </dependency> 
{code}

Bigtop trunk hadoop tests are using 2.0.0-SNAPSHOT.

{code}
+public interface ClusterAdapter {                                                       
                                              
+ /**                                                                                    
                                              
+  * Cluster Daemons: NameNode, DataNode, JobTracker, TaskTracker, SecondaryNameNode, HRegionServer,
HMaster                            
+  /** 
{code}

The "Cluster Daemons:" comment seems unnecessary because the specific daemons are not referenced
in the rest of the class.


{code}
+  /**                                                                                   
                                              
+   * Shuts down HBase cluster                                                           
                                              
+   */                                                                                   
                                              
+  void hbaseShutdown();   
{code}

In HDFSAdapter, there is stopHDFSservice, startHDFSservice, and restartHDFSservice (MRAdapter
follows the same style, too). Seems like we should have a startHBaseService, stopHBaseService,
and restartHBaseService. Also, should we truncate these names to stopHDFS/startHDFS? Tagging
"Service" on the end might be unnecessary. I think most people will know what you mean.

{code}
private LinkedList<Host> cluster = new LinkedList<Host>();
{code}

Perhaps rename to clusterHosts? If I'm reading "cluster" in other parts of the code, I might
not quickly remember that it's a collection of Hosts.

{code}
+         //dataNode.refreshDaemons(); 
{code}

Remove if unnecessary.

{code}
+ public void waitUntilStarted(String daemon, Host hostname, long timeout) throws Exception
{                                                                                    
+   assertTrue(hostname != null);                                                        
                                                                                       
+   long endTime = System.currentTimeMillis() + timeout;                                 
                                                                                       
+   boolean done = false;                                                                
                                                                                       
+     while (!done) {                                                                    
                                                                                       
+         if (System.currentTimeMillis() > endTime) {   
+             throw new Exception("Timeout value reached");                              
                                                                                       
+         }                                                                              
                                                                                       
+       for (Daemon d : hostname.getDaemons()) {                                         
                                                                                       
+         if (d.getName().equalsIgnoreCase(daemon)) {                                    
                                                                                       
+           done = true;                                                                 
                                                                                       
+         }                                                                              
                                                                                       
+       }                                                                                
                                                                                       
+     }                                                                                  
                                                                                       
+ }                                                                                      
                                                                                       
+ /**                                                                                    
                                                                                       
+  * Stalls thread until specified daemon is stopped on specified machine or timeout value
is reached.                                                                           
+  * @throws Exception                                                                   
                                                                                       
+  */                                                                                    
                                                                                       
+ public void waitUntilStopped(String daemon, Host hostname, long timeout) throws Exception
{                                                                                    
+   assertTrue(hostname != null);                                                        
                                                                                       
+   long endTime = System.currentTimeMillis() + timeout;                                 
                                                                                       
+   boolean done = false;                                                                
                                                                                       
+     while (!done) {                                                                    
                                                                                       
+         if (System.currentTimeMillis() > endTime) {                                 
                                                                                         

+             throw new Exception("Timeout value reached");                              
                                                                                       
+         }                                                                              
                                                                                       
+       boolean isStopped = true;                                                        
                                                                                       
+       for (Daemon d : hostname.getDaemons()) {                                         
                                                                                       
+         if (d.getName().equalsIgnoreCase(daemon)) {                                    
                                                                                       
+           isStopped = false;                                                           
                                                                                       
+         }                                                                              
                                                                                       
+       }                                                                                
                                                                                       
+       if (isStopped) {                                                                 
                                                                                       
+         done = true;                                                                   
                                                                                       
+       }                                                                                
                                                                                       
+     }                                                                                  
                                                                                       
+ } 
{code}

Seems like we can refactor these 2 methods because they share a lot in common.

{code}
+   if (onNamenode) {                                                                    
                                                                                       
+                                                                                        
                  
+   }                                                                                    
                                                                                       
+   else {                                                                               
                                                                                       
+     runShellCommand("sudo -u hdfs hdfs haadmin -failover " + active + " " + standby, active_host,
false, false);                                                               
+   }        
{code}

I think we can just call shHDFS.exec("hdfs haadmin -failover " + active + " " + standby);
If we successfully get hdfs user's shell on any node in the cluster, we should be able to
perform failover using it.

{code}
+++ bigtop-test-framework/src/main/groovy/org/apache/bigtop/itest/clustermanager/distributions/VersionAClusterManager.java
{code}

Should we start thinking of a different name for this? Maybe BigtopClusterManager like you
mentioned before. 

{code}
+++ bigtop-test-framework/src/test/groovy/org/apache/bigtop/itest/clustermanager/HAMRBCMHelperThread.java
                              
{code}
{code}
+++ bigtop-test-framework/src/test/groovy/org/apache/bigtop/itest/clustermanager/TestHAMRBCM.java
                                      
{code}

We should move these tests into the Bigtop Hadoop test artifacts.
                
> Implement a cluster-abstraction, discovery and manipulation framework for iTest
> -------------------------------------------------------------------------------
>
>                 Key: BIGTOP-635
>                 URL: https://issues.apache.org/jira/browse/BIGTOP-635
>             Project: Bigtop
>          Issue Type: New Feature
>          Components: Tests
>    Affects Versions: 0.4.0
>            Reporter: Roman Shaposhnik
>            Assignee: Sujay Rau
>             Fix For: 0.5.0
>
>         Attachments: BigtopClusterManager.zip, BigtopClusterManagerv2.zip, ClusterManagerAPI.pdf,
bigtop-635.patch
>
>
> We've come to a point where our tests need to have a uniform way of interfacing with
the cluster under test. It is no longer ok to assume that the test can be executed on a particular
node (and thus have access to services running on it). It is also less than ideal for tests
to assume a particular type of interaction with the services since it tends to break in different
deployment scenarios. 
> A framework that needs to be put in place has to be capable of (regardless of where a
test using it is executed on):
>   # representing the abstract configuration of the cluster
>   # representing the abstract topology of the entire cluster (services running on a cluster,
nodes hosting the daemons, racks, etc).
>   # giving tests an ability to query this topology
>   # giving tests an ability to affect the nodes in that topology in a particular way
(refreshing configuration, restarting services, etc.)
> Of course, the ideal solution here would be to give Bigtop tests a programmatic access
to a Hadoop cluster management framework such as Cloudera's CM or Apache Ambari. 
> As with any ideal solutions I don't think it is realistic though. Hence we have to cook
something up. At this point I'm really focused on getting the API right and I'm totally fine
with an implementation of that API to be something as silly as a bunch of ssh-based scripts
or something.
> This JIRA is primarily focused on coming up with such an API. Anybody who's willing to
help is welcome to.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message