asterixdb-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Till Westmann (Code Review)" <do-not-re...@asterixdb.incubator.apache.org>
Subject Change in asterixdb[master]: Asterix NCs Fault Tolerance
Date Sat, 16 Jan 2016 07:18:56 GMT
Till Westmann has posted comments on this change.

Change subject: Asterix NCs Fault Tolerance
......................................................................


Patch Set 3:

(47 comments)

Mostly typos and whitespace - only a few code questions.

https://asterix-gerrit.ics.uci.edu/#/c/573/3/asterix-common/src/main/java/org/apache/asterix/common/config/AsterixReplicationProperties.java
File asterix-common/src/main/java/org/apache/asterix/common/config/AsterixReplicationProperties.java:

Line 35:     //we could centralized then a single configuration file.
I think that it would be better to put this into a JIRA issue (ideally with a list of locations
where the default values are).


https://asterix-gerrit.ics.uci.edu/#/c/573/3/asterix-common/src/main/java/org/apache/asterix/common/transactions/IRecoveryManager.java
File asterix-common/src/main/java/org/apache/asterix/common/transactions/IRecoveryManager.java:

Line 116:      * Reply the logs that belong to the passed {@code partitions} starting from
the {@code lowWaterMarkLSN}
s/Reply/Replay/


https://asterix-gerrit.ics.uci.edu/#/c/573/3/asterix-installer/src/test/resources/clusterts/cluster_with_replication.xml
File asterix-installer/src/test/resources/clusterts/cluster_with_replication.xml:

Line 19:     <cluster xmlns="cluster">
Fix indentation?


https://asterix-gerrit.ics.uci.edu/#/c/573/3/asterix-installer/src/test/resources/integrationts/replication/queries/failover/bulkload/bulkload.1.ddl.aql
File asterix-installer/src/test/resources/integrationts/replication/queries/failover/bulkload/bulkload.1.ddl.aql:

Line 22:  * start 2 nodes, bulkload a dataset, query data, kill one node and wait until the
failover complete, query the data again.
line break?


Line 28:     create dataverse TinySocial;
Indentation?


https://asterix-gerrit.ics.uci.edu/#/c/573/3/asterix-installer/src/test/resources/integrationts/replication/queries/failover/bulkload/bulkload.2.update.aql
File asterix-installer/src/test/resources/integrationts/replication/queries/failover/bulkload/bulkload.2.update.aql:

Line 22:  * start 2 nodes, bulkload a dataset, query data, kill one node and wait until the
failover complete, query the data again.
line break?


https://asterix-gerrit.ics.uci.edu/#/c/573/3/asterix-installer/src/test/resources/integrationts/replication/queries/failover/bulkload/bulkload.3.txnqbc.aql
File asterix-installer/src/test/resources/integrationts/replication/queries/failover/bulkload/bulkload.3.txnqbc.aql:

Line 22:  * start 2 nodes, bulkload a dataset, query data, kill one node and wait until the
failover complete, query the data again.
line break?


https://asterix-gerrit.ics.uci.edu/#/c/573/3/asterix-installer/src/test/resources/integrationts/replication/queries/failover/bulkload/bulkload.6.txnqar.aql
File asterix-installer/src/test/resources/integrationts/replication/queries/failover/bulkload/bulkload.6.txnqar.aql:

Line 22:  * start 2 nodes, bulkload a dataset, query data, kill one node and wait until the
failover complete, query the data again.
line break?


https://asterix-gerrit.ics.uci.edu/#/c/573/3/asterix-installer/src/test/resources/integrationts/replication/queries/failover/mem_component_recovery/mem_component_recovery.1.ddl.aql
File asterix-installer/src/test/resources/integrationts/replication/queries/failover/mem_component_recovery/mem_component_recovery.1.ddl.aql:

Line 22:  * start 2 nodes, bulkload a dataset, copy it to in-memory dataset, query data from
memory, kill one node and wait until the failover complete, query the data again.
line break?


Line 28:     create dataverse TinySocial;
Indentation?


https://asterix-gerrit.ics.uci.edu/#/c/573/3/asterix-installer/src/test/resources/integrationts/replication/queries/failover/mem_component_recovery/mem_component_recovery.2.update.aql
File asterix-installer/src/test/resources/integrationts/replication/queries/failover/mem_component_recovery/mem_component_recovery.2.update.aql:

Line 22:  * start 2 nodes, bulkload a dataset, copy it to in-memory dataset, query data from
memory, kill one node and wait until the failover complete, query the data again.
line break?


https://asterix-gerrit.ics.uci.edu/#/c/573/3/asterix-installer/src/test/resources/integrationts/replication/queries/failover/mem_component_recovery/mem_component_recovery.3.txnqbc.aql
File asterix-installer/src/test/resources/integrationts/replication/queries/failover/mem_component_recovery/mem_component_recovery.3.txnqbc.aql:

Line 22:  * start 2 nodes, bulkload a dataset, copy it to in-memory dataset, query data from
memory, kill one node and wait until the failover complete, query the data again.
line break?


https://asterix-gerrit.ics.uci.edu/#/c/573/3/asterix-installer/src/test/resources/integrationts/replication/queries/failover/mem_component_recovery/mem_component_recovery.6.txnqar.aql
File asterix-installer/src/test/resources/integrationts/replication/queries/failover/mem_component_recovery/mem_component_recovery.6.txnqar.aql:

Line 22:  * start 2 nodes, bulkload a dataset, copy it to in-memory dataset, query data from
memory, kill one node and wait until the failover complete, query the data again.
line break


https://asterix-gerrit.ics.uci.edu/#/c/573/3/asterix-installer/src/test/resources/integrationts/replication/queries/failover/metadata_node/metadata_node.1.ddl.aql
File asterix-installer/src/test/resources/integrationts/replication/queries/failover/metadata_node/metadata_node.1.ddl.aql:

Line 22:  * start 2 nodes, create a dataset, kill metadata node and wait until the failover
complete, verify the dataset still exists.
line break?


Line 28:     create dataverse TinySocial;
Indentation?


https://asterix-gerrit.ics.uci.edu/#/c/573/3/asterix-installer/src/test/resources/integrationts/replication/queries/failover/metadata_node/metadata_node.2.txnqbc.aql
File asterix-installer/src/test/resources/integrationts/replication/queries/failover/metadata_node/metadata_node.2.txnqbc.aql:

Line 22:  * start 2 nodes, create a dataset, kill metadata node and wait until the failover
complete, verify the dataset still exists.
line break?


https://asterix-gerrit.ics.uci.edu/#/c/573/3/asterix-installer/src/test/resources/integrationts/replication/queries/failover/metadata_node/metadata_node.5.txnqar.aql
File asterix-installer/src/test/resources/integrationts/replication/queries/failover/metadata_node/metadata_node.5.txnqar.aql:

Line 22:  * start 2 nodes, create a dataset, kill metadata node and wait until the failover
complete, verify the dataset still exists.
line break?


https://asterix-gerrit.ics.uci.edu/#/c/573/3/asterix-om/src/main/java/org/apache/asterix/om/util/AsterixClusterProperties.java
File asterix-om/src/main/java/org/apache/asterix/om/util/AsterixClusterProperties.java:

Line 301:         AsterixReplicationProperties repliactionProperties = AsterixAppContextInfo.getInstance()
a/repliactionProperties/replicationProperties/


Line 313:                 if (ncConfiguration.containsKey(replica)) {
What does it mean, if we got the replica from the replication properties, but it is not in
the NC configuration? Why is ignoring the replica the right approach?


Line 337:                 e.printStackTrace();
This looks a little scary. Why is it ok to ignore the exception?


Line 382:             e.printStackTrace();
Should we just continue to run if this fails?


https://asterix-gerrit.ics.uci.edu/#/c/573/3/asterix-replication/src/main/java/org/apache/asterix/replication/management/ReplicationChannel.java
File asterix-replication/src/main/java/org/apache/asterix/replication/management/ReplicationChannel.java:

Line 341:                     String compoentId = LSMComponentProperties.getLSMComponentID(afp.getFilePath());
s/compoentId/componentId/


Line 381:             Map<String, ClusterPartition[]> nodePartitions = ((IAsterixPropertiesProvider)
asterixAppRuntimeContextProvider
line break?


https://asterix-gerrit.ics.uci.edu/#/c/573/3/asterix-replication/src/main/java/org/apache/asterix/replication/management/ReplicationManager.java
File asterix-replication/src/main/java/org/apache/asterix/replication/management/ReplicationManager.java:

Line 123:     private String hostIPAddressFirstOctet = null;
This doesn't seem to be a good idea. Why would multiple network interfaces have a different
first byte?


https://asterix-gerrit.ics.uci.edu/#/c/573/3/asterix-replication/src/main/java/org/apache/asterix/replication/recovery/RemoteRecoveryManager.java
File asterix-replication/src/main/java/org/apache/asterix/replication/recovery/RemoteRecoveryManager.java:

Line 62:         //Currently we will not allow a node to perform remote recovery since another
replica already tookover its workload
line break?


Line 73:             //start recovery recovery steps
Just 1 recovery?


Line 88:                     throw new IllegalStateException("no ACTIVE remote replica(s)
exists to performe remote recovery");
s/performe/perform/


Line 103:                 maxRemoteLSN = replicationManager.getMaxRemoteLSN(selectedRemoteReplicas.keySet());
Why is maxRemoteLSN initialized separately?


https://asterix-gerrit.ics.uci.edu/#/c/573/3/asterix-replication/src/main/java/org/apache/asterix/replication/storage/LSMComponentProperties.java
File asterix-replication/src/main/java/org/apache/asterix/replication/storage/LSMComponentProperties.java:

Line 142:     public void setComponentId(String componentId) {
unused?


Line 150:     public void setOriginalLSN(long lSN) {
unused?


https://asterix-gerrit.ics.uci.edu/#/c/573/3/asterix-replication/src/main/java/org/apache/asterix/replication/storage/LSMIndexFileProperties.java
File asterix-replication/src/main/java/org/apache/asterix/replication/storage/LSMIndexFileProperties.java:

Line 46:     public LSMIndexFileProperties(String filePath, long fileSize, String nodeId,
boolean lsmComponentFile,
unused


Line 96:         LSMIndexFileProperties fileProp = new LSMIndexFileProperties();
Just use the constructor with arguments?


Line 105:     public void setFilePath(String filePath) {
unused?


Line 117:     public void setFileName(String fileName) {
unused?


Line 137:     public void setFileSize(long fileSize) {
unused?


Line 145:     public void setIdxName(String idxName) {
unused?


Line 153:     public void setLsmComponentFile(boolean lsmComponentFile) {
unused?


Line 161:     public void setRequiresAck(boolean requiresAck) {
unused?


https://asterix-gerrit.ics.uci.edu/#/c/573/3/asterix-replication/src/main/java/org/apache/asterix/replication/storage/ReplicaResourcesManager.java
File asterix-replication/src/main/java/org/apache/asterix/replication/storage/ReplicaResourcesManager.java:

Line 78:                 }
Any reason why we don't use FileUtils.deleteQuietly here as well?


Line 126:         }
Could we just use Files.deleteIfExists here?


Line 337:             }
"return name.endsWith(LSM_COMPONENT_MASK_SUFFIX);" ?


Line 347:             }
"return !name.endsWith(LSM_COMPONENT_MASK_SUFFIX);"?


Line 353:             if (name.equalsIgnoreCase(PersistentLocalResourceRepository.METADATA_FILE_NAME))
{
"return name.equalsIgnoreCase(PersistentLocalResourceRepository.METADATA_FILE_NAME) || !name.startsWith(".");"
?


https://asterix-gerrit.ics.uci.edu/#/c/573/3/asterix-transactions/src/main/java/org/apache/asterix/transaction/management/resource/PersistentLocalResourceRepository.java
File asterix-transactions/src/main/java/org/apache/asterix/transaction/management/resource/PersistentLocalResourceRepository.java:

Line 166:                 ObjectOutputStream oosToFos = new ObjectOutputStream(fos)) {
Nice, AutoCloseables :)


https://asterix-gerrit.ics.uci.edu/#/c/573/3/asterix-transactions/src/main/java/org/apache/asterix/transaction/management/service/recovery/RecoveryManager.java
File asterix-transactions/src/main/java/org/apache/asterix/transaction/management/service/recovery/RecoveryManager.java:

Line 1119:         //-------------------------------------------------------------------------
I think it would be nice to split this into 2 methods - one for the analysis phase and 1 for
the redo phase. That would make a little clearer what is local to each phase and what is relevant
across phases.
However, I'm not entirely sure how feasible this is.


Line 1155:                                     //if we don't have enough memory for one more
job, we will force all jobs to spill their cached entities to disk.
Line breaks?


Line 1268:                                     datasetLifecycleManager.open(resourceAbsolutePath);
It seems that these don't get closed if something throws ...


-- 
To view, visit https://asterix-gerrit.ics.uci.edu/573
To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ice26d980912a315fcb3efdd571d6ce88717cfea4
Gerrit-PatchSet: 3
Gerrit-Project: asterixdb
Gerrit-Branch: master
Gerrit-Owner: Murtadha Hubail <hubailmor@gmail.com>
Gerrit-Reviewer: Ian Maxon <imaxon@apache.org>
Gerrit-Reviewer: Jenkins <jenkins@fulliautomatix.ics.uci.edu>
Gerrit-Reviewer: Murtadha Hubail <hubailmor@gmail.com>
Gerrit-Reviewer: Till Westmann <tillw@apache.org>
Gerrit-Reviewer: abdullah alamoudi <bamousaa@gmail.com>
Gerrit-HasComments: Yes

Mime
View raw message