hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doug Cutting (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-253) we need speculative execution for reduces
Date Thu, 25 May 2006 16:06:30 GMT
    [ http://issues.apache.org/jira/browse/HADOOP-253?page=comments#action_12413270 ] 

Doug Cutting commented on HADOOP-253:
-------------------------------------

+1

As before, it must be possible to disable speculative execution for reduces with side-effects.
 Also, some of the support needs to be in the output format implementations.  In particular,
these should first write to a temporary name, then rename the output when the task completes,
but only if the output does not already exist.  We should note that this is the expected behavior
for output formats in the interface javadoc, and put as much of the implementation as possible
in the base class.


> we need speculative execution for reduces
> -----------------------------------------
>
>          Key: HADOOP-253
>          URL: http://issues.apache.org/jira/browse/HADOOP-253
>      Project: Hadoop
>         Type: Improvement

>   Components: mapred
>     Versions: 0.2.1
>     Reporter: Owen O'Malley
>     Assignee: Owen O'Malley

>
> With my new http-based shuffle (on top of the svn head including sameer's parallel fetch),
I just finished sorting 2010 g on 200 nodes in 8:49 with 9 reduce failures. However, the amusing
part is that the replacement reduces were _not_ the slow ones. 8 of the original reduces were
the only things running for the last hour. The job timings looked like:
> Job 0001
>   Total:
>     Tasks: 16551
>     Total: 10056104 secs
>     Average: 607 secs
>     Worst: task_0001_r_000291_0
>     Worst time: 31050 secs
>     Best: task_0001_m_013597_0
>     Best time: 20 secs
>   Maps:
>     Tasks: 16151
>     Total: 2762635 secs
>     Average: 171 secs
>     Worst: task_0001_m_002290_0
>     Worst time: 2663 secs
>     Best: task_0001_m_013597_0
>     Best time: 20 secs
>   Reduces:
>     Tasks: 400
>     Total: 7293469 secs
>     Average: 18233 secs
>     Worst: task_0001_r_000291_0
>     Worst time: 31050 secs
>     Best: task_0001_r_000263_1
>     Best time: 5591 secs
> And the number of tasks run per a node was very uneven:
> #tasks node
> 124 node1161
> 117 node1307
> 117 node1124
> 116 node1253
> 114 node1310
> 111 node1302
> 111 node1299
> 111 node1298
> 111 node1249
> 111 node1221
> 110 node1288
> 110 node1286
> 110 node1211
> 109 node1268
> 108 node1292
> 108 node1202
> 108 node1200
> 107 node1313
> 107 node1277
> 107 node1246
> 107 node1242
> 107 node1231
> 107 node1214
> 106 node1243
> 105 node1251
> 105 node1212
> 105 node1205
> 104 node1272
> 104 node1269
> 104 node1210
> 104 node1203
> 104 node1193
> 104 node1128
> 103 node1300
> 103 node1285
> 103 node1279
> 103 node1209
> 103 node1173
> 103 node1165
> 102 node1276
> 102 node1239
> 102 node1228
> 102 node1204
> 102 node1188
> 101 node1314
> 101 node1303
> 100 node1301
> 100 node1252
> 99 node1287
> 99 node1213
> 99 node1206
> 98 node1295
> 98 node1186
> 97 node1293
> 97 node1265
> 97 node1262
> 97 node1260
> 97 node1258
> 97 node1235
> 97 node1229
> 97 node1226
> 97 node1215
> 97 node1208
> 97 node1187
> 97 node1175
> 97 node1171
> 96 node1291
> 96 node1248
> 96 node1224
> 96 node1216
> 95 node1305
> 95 node1280
> 95 node1263
> 95 node1254
> 95 node1153
> 95 node1115
> 94 node1271
> 94 node1261
> 94 node1234
> 94 node1233
> 94 node1227
> 94 node1225
> 94 node1217
> 94 node1142
> 93 node1275
> 93 node1198
> 93 node1107
> 92 node1266
> 92 node1220
> 92 node1219
> 91 node1309
> 91 node1289
> 91 node1270
> 91 node1259
> 91 node1256
> 91 node1232
> 91 node1179
> 89 node1290
> 89 node1255
> 89 node1247
> 89 node1207
> 89 node1201
> 89 node1190
> 89 node1154
> 89 node1141
> 88 node1306
> 88 node1282
> 88 node1250
> 88 node1222
> 88 node1184
> 88 node1149
> 88 node1117
> 87 node1278
> 87 node1257
> 87 node1191
> 87 node1185
> 87 node1180
> 86 node1297
> 86 node1178
> 85 node1195
> 85 node1143
> 85 node1112
> 84 node1281
> 84 node1274
> 84 node1264
> 83 node1296
> 83 node1148
> 82 node1218
> 82 node1168
> 82 node1167
> 81 node1311
> 81 node1240
> 81 node1223
> 81 node1196
> 81 node1164
> 81 node1116
> 80 node1267
> 80 node1230
> 80 node1177
> 80 node1119
> 79 node1294
> 79 node1199
> 79 node1181
> 79 node1170
> 79 node1166
> 79 node1103
> 78 node1244
> 78 node1189
> 78 node1157
> 77 node1304
> 77 node1172
> 74 node1182
> 71 node1160
> 71 node1147
> 68 node1236
> 68 node1183
> 67 node1245
> 59 node1139
> 58 node1312
> 57 node1162
> 56 node1308
> 56 node1197
> 55 node1146
> 54 node1106
> 53 node1111
> 53 node1105
> 49 node1145
> 49 node1123
> 48 node1176
> 46 node1136
> 44 node1132
> 44 node1125
> 44 node1122
> 44 node1108
> 43 node1192
> 43 node1121
> 42 node1194
> 42 node1138
> 42 node1104
> 41 node1155
> 41 node1126
> 41 node1114
> 40 node1158
> 40 node1151
> 40 node1137
> 40 node1110
> 40 node1100
> 39 node1156
> 38 node1140
> 38 node1135
> 38 node1109
> 37 node1144
> 37 node1120
> 36 node1118
> 34 node1133
> 34 node1113
> 31 node1134
> 26 node1127
> 23 node1101
> 20 node1131
> And it should not surprise us that the last 8 reduces were running on nodes 1134, 1127,1101,
and 1131. This really demonstrates the need to run speculative reduce runs. 
> I propose that when the list of reduce jobs running is down to 1/2 the cluster size that
we start running speculative reduces. I estimate that it would have saved around an hour on
this run. Does that sound like a reasonable heuristic?

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


Mime
View raw message