hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chen Liang (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HADOOP-13439) Fix race between TestMetricsSystemImpl and TestGangliaMetrics
Date Thu, 04 Aug 2016 22:37:20 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-13439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15408591#comment-15408591
] 

Chen Liang edited comment on HADOOP-13439 at 8/4/16 10:37 PM:
--------------------------------------------------------------

Thanks for such detailed feedbacks [~iwasakims], it helped a lot!

Given what you mentioned here, I did some more investigation, and I think I verified that
it is possible for one test to read config created by another. 

The two classes both have this line:
 {code}.save(TestMetricsConfig.getTestFilename("hadoop-metrics2-test"));{code}
Which writes the config to the exactly same file "hadoop-metrics2-test.properties" first.
Then both classes have the call:
{code}ms.start();{code}
Which reads the file. So there can be race condition from the write to the read. This error
can be reproduced by adding delay between .save() and ms.start() call, and run the tests at
the same time.

Another problem with this race condition is the missing metric error of checkMetrics() as
mentioned in HADOOP-12588. For example, one class might have the following in its config
{code}.add("test.source.s1.metric.filter.exclude", "X*"){code}
which configures to remove Xxx from the metric sources. If the other class is expecting this
metric, we would end up getting this error:
{code}java.lang.AssertionError: Missing metrics: test.s1rec.Xxx{code}
One solution I'm thinking of is to separate the config file they are writing to by introducing
a unique prefix for each test case, such that each test uses a different config file, similar
to RollingFileSystemSinkTestBase#initMetricsSystem. The pros is that we are still able to
run test cases in parallel without race condition, and the cons is that there might be a couple
tens more .properties generated. What do you think about this approach? Or do you have any
other suggestions?


was (Author: vagarychen):
Thanks for such detailed feedbacks [~iwasakims], it helped a lot!

Given what you mentioned here, I did some more investigation, and I think I verified that
it is possible for one test to read config created by another. 

The two classes both have this line:
 {code}.save(TestMetricsConfig.getTestFilename("hadoop-metrics2-test"));{code}
Which writes the config to the exactly same file "hadoop-metrics2-test.properties" first.
Then both classes have the call:
{code}ms.start();{code}
Which reads the file. So there can be race condition from the write to the read. This error
can be reproduced by adding delay between .save() and ms.start() call, and run the tests at
the same time.

Another problem with this race condition is the missing metric error of checkMetrics() as
mentioned in HADOOP-12588. For example, one class might have the following in its config
{code}.add("test.source.s1.metric.filter.exclude", "X*"){code}
which configures to remove Xxx from the metric sources. If the other class is expecting this
metric, we would end up getting this error:
{code}java.lang.AssertionError: Missing metrics: test.s1rec.Xxx{code}
One solution I'm thinking of is to separate the config file they are writing to by introducing
a unique prefix for each test case, such that each test uses a different config file, similar
to RollingFileSystemSinkTestBase#initMetricsSystem. The pros is that we are still able to
run test cases in parallel without race condition, and the cons is that there might be a couple
tens more .properties generated. What do you think about this approach? Or do you have any
suggestions?

> Fix race between TestMetricsSystemImpl and TestGangliaMetrics
> -------------------------------------------------------------
>
>                 Key: HADOOP-13439
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13439
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: test
>            Reporter: Masatake Iwasaki
>            Assignee: Chen Liang
>            Priority: Minor
>
> TestGangliaMetrics#testGangliaMetrics2 set *.period to 120 but 8 was used.
> {noformat}
> 2016-06-27 15:21:31,480 INFO  impl.MetricsSystemImpl (MetricsSystemImpl.java:startTimer(375))
- Scheduled snapshot period at 8 second(s).
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message