hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Oleksandr Shevchenko (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-7133) History Server task attempts REST API returns invalid data
Date Thu, 13 Sep 2018 09:24:00 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-7133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16613238#comment-16613238
] 

Oleksandr Shevchenko commented on MAPREDUCE-7133:
-------------------------------------------------

Thank you [~jlowe] for the clarification. Good point about backporting into previous versions.
Attached new patch. Created the separated ticket for refactoring MAPREDUCE-7140 (I will attach
the patch for this after merging MAPREDUCE-7133).

> History Server task attempts REST API returns invalid data
> ----------------------------------------------------------
>
>                 Key: MAPREDUCE-7133
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7133
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: jobhistoryserver
>            Reporter: Oleksandr Shevchenko
>            Assignee: Oleksandr Shevchenko
>            Priority: Major
>         Attachments: MAPREDUCE-7133.001.patch, MAPREDUCE-7133.002.patch, MAPREDUCE-7133.003.patch,
MAPREDUCE-7133.004.patch, MAPREDUCE-7133.005.patch
>
>
> When we send a request to History Server with headers : Accept: application/json [https://nodename:19888/ws/v1/history/mapreduce/jobs/job_1535363926925_0040/tasks/task_1535363926925_0040_r_000003/attempts|https://192.168.121.199:19890/ws/v1/history/mapreduce/jobs/job_1535363926925_0040/tasks/task_1535363926925_0040_r_000003/attempts] 
> we get the following JSON:
> {code:java}
> {
> "taskAttempts": {
> "taskAttempt": [{
> "type": "reduceTaskAttemptInfo",
> "startTime": 1535372984638,
> "finishTime": 1535372986149,
> "elapsedTime": 1511,
> "progress": 100.0,
> "id": "attempt_1535363926925_0040_r_000003_0",
> "rack": "/default-rack",
> "state": "SUCCEEDED",
> "status": "reduce > reduce",
> "nodeHttpAddress": "node2.cluster.com:8044",
> "diagnostics": "",
> "type": "REDUCE",
> "assignedContainerId": "container_e01_1535363926925_0040_01_000006",
> "shuffleFinishTime": 1535372986056,
> "mergeFinishTime": 1535372986075,
> "elapsedShuffleTime": 1418,
> "elapsedMergeTime": 19,
> "elapsedReduceTime": 74
> }]
> }
> }
> {code}
> As you can see "type" property has duplicates:
> "type": "reduceTaskAttemptInfo"
> "type": "REDUCE"
> It's lead to an error during parsing response body as JSON is not valid.
> When we use application/xml we get the following response:
> {code:java}
> <taskAttempts>
> <taskAttempt xmlns:xsi="[http://www.w3.org/2001/XMLSchema-instance]" xsi:type="reduceTaskAttemptInfo"><startTime>1535372984638</startTime><finishTime>1535372986149</finishTime><elapsedTime>1511</elapsedTime><progress>100.0</progress><id>attempt_1535363926925_0040_r_000003_0</id><rack>/default-rack</rack><state>SUCCEEDED</state><status>reduce
> reduce</status><nodeHttpAddress>[node2.cluster.com:8044|http://node2.cluster.com:8044]</nodeHttpAddress><diagnostics/><type>REDUCE</type><assignedContainerId>container_e01_1535363926925_0040_01_000006</assignedContainerId><shuffleFinishTime>1535372986056</shuffleFinishTime><mergeFinishTime>1535372986075</mergeFinishTime><elapsedShuffleTime>1418</elapsedShuffleTime><elapsedMergeTime>19</elapsedMergeTime><elapsedReduceTime>74</elapsedReduceTime></taskAttempt>
> </taskAttempts>
> {code}
> Take a look at the following string:
> {code:java}
> <taskAttempt xmlns:xsi="[http://www.w3.org/2001/XMLSchema-instance]" xsi:type="reduceTaskAttemptInfo">
> {code}
> We got "xsi:type" attribute which incorectly marshall later to duplicated field if we
use JSON format.
> It acceptable only to REDUCE task. For MAP task we get xml without "xsi:type" attribute.
> {code:java}
> <taskAttempts>
> <taskAttempt>
> <startTime>1535370756528</startTime>
> <finishTime>1535370760318</finishTime>
> <elapsedTime>3790</elapsedTime>
> <progress>100.0</progress>
> <id>attempt_1535363926925_0029_m_000001_0</id>
> <rack>/default-rack</rack>
> <state>SUCCEEDED</state>
> <status>map > sort</status>
> <nodeHttpAddress>[node2.cluster.com:8044|http://node2.cluster.com:8044]</nodeHttpAddress>
> <diagnostics/>
> <type>MAP</type>
> <assignedContainerId>container_e01_1535363926925_0029_01_000003</assignedContainerId>
> </taskAttempt>
> </taskAttempts>
> {code}
> This happens since we have two different hierarchical classes for MAP ->TaskAttemptInfo
and REDUCE- > ReduceTaskAttemptInfo tasks.
> ReduceTaskAttemptInfo extends TaskAttemptInfo, later we marshal all tasks (map and reduce)
by TaskAttemptsInfo.getTaskAttempt(). In this place, we do not have any information about ReduceTaskAttemptInfo
type as we store all tasks in ArrayList<TaskAttemptInfo>. 
> During marshaling we see that actual type of task ReduceTaskAttemptInfo instead of TaskAttemptsInfo
and add meta information for this. That's why we get duplicated fields.
> Unfortunately we do not catch it before in TestHsWebServicesAttempts since we use 
> org.codehaus.jettison.json.JSONObject library which overrides duplicated fields. Even
when we use Postman to do request we get valid JSON. Only when we change represent type to
Raw we can notice this issue. Also, we able to reproduce this bug by using "org.json:json"
lib:
> Something like this:
> {code:java}
> BufferedReader inReader = new BufferedReader( new InputStreamReader(connection.getInputStream()
) );
>  String inputLine;
>  StringBuilder response = new StringBuilder();
> while ( (inputLine = inReader.readLine()) != null ) {
>  response.append(inputLine);
>  }
> inReader.close();
> JSONObject o = new JSONObject(response.toString());
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-help@hadoop.apache.org


Mime
View raw message