hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Oleksandr Shevchenko (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-7133) History Server task attempts REST API returns invalid data
Date Wed, 12 Sep 2018 07:17:00 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-7133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16611695#comment-16611695
] 

Oleksandr Shevchenko commented on MAPREDUCE-7133:
-------------------------------------------------

Thank you [~jlowe] for the detailed review!
{quote}TaskAttemptInfo is now an abstract class with no abstract methods? I'm not sure the
MapTaskAttemptInfo changes are really necessary here...
{quote}
I marked TaskAttemptInfo as an abstract class to emphasize we shouldn't use this class to
create instances, we should use Map and Reduce subclasses instead and prevent creating of
TaskAttemptInfo instead of MapTaskAttemptInfo since someone can use TaskAttemptInfo (if TaskAttemptInfo
is not an abstract class) later since won't notice MapTaskAttemptInfo it's will lead to promiscuous
using of these classes. So, I do not see anything bad in using an abstract class without abstract
methods since this makes sense and this is a frequently used case. But if you are confused
when looking from the outside I can change it.

About creating MapTaskAttemptInfo you are right that the minimum changes needed if we just
add XmlElementRef to the task list in TaskAttemptsInfo. But I thought that creating MapTaskAttemptInfo
is the good thing since I expected to find this class when started to debug and spent some
time to understand the logic of hierarchy and inheritance of XML annotations. It also needed
to unify constructors access to both Map and Reduce tasks. And to add all duplicated XML annotations
only to the base class (like XmlAccessorType). And as you said this is much better for a code
readability.

I agree that we can do these changes in a separated commit. But I thought that this patch
a suitable place for such changes since we change the logic of XML/JSON marshaling and inheritance
of XML annotations scope.

What do you think? What is an appropriate way to do these changes? As I think these changes
really needed. Do we need to create a separate ticket for refactoring TaskAttemptInfo? Or
better to do not separate changes with the common things in separated commits?
{quote}Speaking of the XMLElementRef, there should be a comment explaining why it's very important
to use that rather than FIELD accessor type.
{quote}
I agree that using XMLElementRef annotations can be unobvious. I thought anyone can find this
information in commit annotations. This is recommended way to changes like this (info from
HowToContribute and my observations). But you are right there is a chance of involuntary removal.
I'll add the comment if you think this is not redundant (in TaskAttemptsInfo or in the unit
test or in both).

Thanks!

> History Server task attempts REST API returns invalid data
> ----------------------------------------------------------
>
>                 Key: MAPREDUCE-7133
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7133
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: jobhistoryserver
>            Reporter: Oleksandr Shevchenko
>            Assignee: Oleksandr Shevchenko
>            Priority: Major
>         Attachments: MAPREDUCE-7133.001.patch, MAPREDUCE-7133.002.patch, MAPREDUCE-7133.003.patch,
MAPREDUCE-7133.004.patch
>
>
> When we send a request to History Server with headers : Accept: application/json [https://nodename:19888/ws/v1/history/mapreduce/jobs/job_1535363926925_0040/tasks/task_1535363926925_0040_r_000003/attempts|https://192.168.121.199:19890/ws/v1/history/mapreduce/jobs/job_1535363926925_0040/tasks/task_1535363926925_0040_r_000003/attempts] 
> we get the following JSON:
> {code:java}
> {
> "taskAttempts": {
> "taskAttempt": [{
> "type": "reduceTaskAttemptInfo",
> "startTime": 1535372984638,
> "finishTime": 1535372986149,
> "elapsedTime": 1511,
> "progress": 100.0,
> "id": "attempt_1535363926925_0040_r_000003_0",
> "rack": "/default-rack",
> "state": "SUCCEEDED",
> "status": "reduce > reduce",
> "nodeHttpAddress": "node2.cluster.com:8044",
> "diagnostics": "",
> "type": "REDUCE",
> "assignedContainerId": "container_e01_1535363926925_0040_01_000006",
> "shuffleFinishTime": 1535372986056,
> "mergeFinishTime": 1535372986075,
> "elapsedShuffleTime": 1418,
> "elapsedMergeTime": 19,
> "elapsedReduceTime": 74
> }]
> }
> }
> {code}
> As you can see "type" property has duplicates:
> "type": "reduceTaskAttemptInfo"
> "type": "REDUCE"
> It's lead to an error during parsing response body as JSON is not valid.
> When we use application/xml we get the following response:
> {code:java}
> <taskAttempts>
> <taskAttempt xmlns:xsi="[http://www.w3.org/2001/XMLSchema-instance]" xsi:type="reduceTaskAttemptInfo"><startTime>1535372984638</startTime><finishTime>1535372986149</finishTime><elapsedTime>1511</elapsedTime><progress>100.0</progress><id>attempt_1535363926925_0040_r_000003_0</id><rack>/default-rack</rack><state>SUCCEEDED</state><status>reduce
> reduce</status><nodeHttpAddress>[node2.cluster.com:8044|http://node2.cluster.com:8044]</nodeHttpAddress><diagnostics/><type>REDUCE</type><assignedContainerId>container_e01_1535363926925_0040_01_000006</assignedContainerId><shuffleFinishTime>1535372986056</shuffleFinishTime><mergeFinishTime>1535372986075</mergeFinishTime><elapsedShuffleTime>1418</elapsedShuffleTime><elapsedMergeTime>19</elapsedMergeTime><elapsedReduceTime>74</elapsedReduceTime></taskAttempt>
> </taskAttempts>
> {code}
> Take a look at the following string:
> {code:java}
> <taskAttempt xmlns:xsi="[http://www.w3.org/2001/XMLSchema-instance]" xsi:type="reduceTaskAttemptInfo">
> {code}
> We got "xsi:type" attribute which incorectly marshall later to duplicated field if we
use JSON format.
> It acceptable only to REDUCE task. For MAP task we get xml without "xsi:type" attribute.
> {code:java}
> <taskAttempts>
> <taskAttempt>
> <startTime>1535370756528</startTime>
> <finishTime>1535370760318</finishTime>
> <elapsedTime>3790</elapsedTime>
> <progress>100.0</progress>
> <id>attempt_1535363926925_0029_m_000001_0</id>
> <rack>/default-rack</rack>
> <state>SUCCEEDED</state>
> <status>map > sort</status>
> <nodeHttpAddress>[node2.cluster.com:8044|http://node2.cluster.com:8044]</nodeHttpAddress>
> <diagnostics/>
> <type>MAP</type>
> <assignedContainerId>container_e01_1535363926925_0029_01_000003</assignedContainerId>
> </taskAttempt>
> </taskAttempts>
> {code}
> This happens since we have two different hierarchical classes for MAP ->TaskAttemptInfo
and REDUCE- > ReduceTaskAttemptInfo tasks.
> ReduceTaskAttemptInfo extends TaskAttemptInfo, later we marshal all tasks (map and reduce)
by TaskAttemptsInfo.getTaskAttempt(). In this place, we do not have any information about ReduceTaskAttemptInfo
type as we store all tasks in ArrayList<TaskAttemptInfo>. 
> During marshaling we see that actual type of task ReduceTaskAttemptInfo instead of TaskAttemptsInfo
and add meta information for this. That's why we get duplicated fields.
> Unfortunately we do not catch it before in TestHsWebServicesAttempts since we use 
> org.codehaus.jettison.json.JSONObject library which overrides duplicated fields. Even
when we use Postman to do request we get valid JSON. Only when we change represent type to
Raw we can notice this issue. Also, we able to reproduce this bug by using "org.json:json"
lib:
> Something like this:
> {code:java}
> BufferedReader inReader = new BufferedReader( new InputStreamReader(connection.getInputStream()
) );
>  String inputLine;
>  StringBuilder response = new StringBuilder();
> while ( (inputLine = inReader.readLine()) != null ) {
>  response.append(inputLine);
>  }
> inReader.close();
> JSONObject o = new JSONObject(response.toString());
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-help@hadoop.apache.org


Mime
View raw message