systemml-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nakul Jindal (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SYSTEMML-1650) GPU cudnn produces worrisome amount of numerical instability
Date Wed, 31 May 2017 00:30:04 GMT

     [ https://issues.apache.org/jira/browse/SYSTEMML-1650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Nakul Jindal updated SYSTEMML-1650:
-----------------------------------
    Description: 
When running GPU tests (mike's run_tests.dml in the nn directory)

{code}

17/05/30 17:24:19 INFO api.DMLScript: BEGIN DML run 05/30/2017 17:24:19
17/05/30 17:24:20 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/05/30 17:24:21 INFO context.GPUContext: Initializing CUDA

Starting grad checks.
---
17/05/30 17:24:22 INFO context.GPUContext:  GPU memory - Total: 2096.300032 MB, Available: 1295.9743999999998 MB on GPUContext{deviceNum=0}
17/05/30 17:24:22 INFO context.GPUContext: Total number of GPUs on the machine: 1
Grad checking the cross-entropy loss function.
Grad checking the L1 loss function.
Grad checking the L1 regularization function.
Grad checking the L2 loss function.
Grad checking the L2 regularization function.
Grad checking the log loss function.

Grad checking the affine layer with L2 loss.
 - Grad checking X.
 - Grad checking W.
 - Grad checking b.
Grad checking the 1D batch normalization layer with L2 loss.
 - Grad checking the 'train' mode.
   - Grad checking X.
   - Grad checking gamma.
   - Grad checking beta.
 - Grad checking the 'test' mode.
   - Grad checking X.
   - Grad checking gamma.
   - Grad checking beta.
Grad checking the 2D (spatial) batch normalization layer with L2 loss.
 - Grad checking the 'train' mode.
   - Grad checking X.
   - Grad checking gamma.
   - Grad checking beta.
 - Grad checking the 'test' mode.
   - Grad checking X.
   - Grad checking gamma.
   - Grad checking beta.
Grad checking the `im2col` 2D convolutional layer with L2 loss.
17/05/30 17:24:28 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
17/05/30 17:24:28 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
 - Grad checking X.
 - Grad checking W.
 - Grad checking b.
Grad checking the built-in 2D convolutional layer with L2 loss.
 - Grad checking X.
 - Grad checking W.
WARNING: Relative error 3.063931109511093E-4 > 1.0E-4 & <= 0.01 with -11.682479557456533 analytical vs -11.689640614065409 numerical, with lossph 40.510115394324195 and lossmh 40.51034918713648
WARNING: Relative error 6.785572589631694E-4 > 1.0E-4 & <= 0.01 with -14.363880156229683 analytical vs -14.383386822913733 numerical, with lossph 40.510088543924184 and lossmh 40.51037621166064
WARNING: Relative error 8.117464157218959E-4 > 1.0E-4 & <= 0.01 with -13.400658690617757 analytical vs -13.378920463225084 numerical, with lossph 40.51009898805432 and lossmh 40.51036656646358
WARNING: Relative error 6.785567321010216E-4 > 1.0E-4 & <= 0.01 with -14.37300870216048 analytical vs -14.39252775057298 numerical, with lossph 40.510088452456074 and lossmh 40.510376303011085
WARNING: Relative error 0.0023065358169588085 > 1.0E-4 & <= 0.01 with -15.081214796672182 analytical vs -15.011804170583785 numerical, with lossph 40.510081360786614 and lossmh 40.510381596870026
WARNING: Relative error 1.2020843619724922E-4 > 1.0E-4 & <= 0.01 with -14.602099111310885 analytical vs -14.60561012436301 numerical, with lossph 40.51008637609418 and lossmh 40.510378488296666
WARNING: Relative error 3.063921242335014E-4 > 1.0E-4 & <= 0.01 with -11.654549775926586 analytical vs -11.66169368929104 numerical, with lossph 40.51011567395115 and lossmh 40.510348907824934
 - Grad checking b.
Grad checking the simple reference 2D convolutional layer with L2 loss.
 - Grad checking X.
 - Grad checking W.
 - Grad checking b.
Grad checking the 2D convolution transpose layer with L2 loss.
 - Grad checking X.
WARNING: Relative error 6.785553488451468E-4 > 1.0E-4 & <= 0.01 with 0.2480096633484488 analytical vs 0.2483464684566172 numerical, with lossph 8.25432627928163 and lossmh 8.25432131235226
WARNING: Relative error 8.117342148227497E-4 > 1.0E-4 & <= 0.01 with 0.46178385247729725 analytical vs 0.4610347690281457 numerical, with lossph 8.254328419943578 and lossmh 8.254319199248197
WARNING: Relative error 6.78555370922306E-4 > 1.0E-4 & <= 0.01 with 0.5511303465906289 analytical vs 0.5518787993707974 numerical, with lossph 8.254329314621874 and lossmh 8.254318277045886
WARNING: Relative error 8.117020436730868E-4 > 1.0E-4 & <= 0.01 with 0.13829553169194606 analytical vs 0.13807120424758068 numerical, with lossph 8.254325180655963 and lossmh 8.254322419231878
WARNING: Relative error 8.117328203683862E-4 > 1.0E-4 & <= 0.01 with 0.5055309144436196 analytical vs 0.5047108680322765 numerical, with lossph 8.25432885801433 and lossmh 8.25431876379697
WARNING: Relative error 8.116455308945274E-4 > 1.0E-4 & <= 0.01 with 0.06899396037823916 analytical vs 0.06888205392741042 numerical, with lossph 8.254324486699112 and lossmh 8.254323109058033
WARNING: Relative error 6.785554871822532E-4 > 1.0E-4 & <= 0.01 with -0.13350593809809497 analytical vs -0.1336872434976044 numerical, with lossph 8.254322458935555 and lossmh 8.254325132680425
WARNING: Relative error 6.785552242935504E-4 > 1.0E-4 & <= 0.01 with -0.2724052650635402 analytical vs -0.2727752001163708 numerical, with lossph 8.254321068071604 and lossmh 8.254326523575607
WARNING: Relative error 6.785555175584701E-4 > 1.0E-4 & <= 0.01 with -0.2904759680044567 analytical vs -0.29087044381981286 numerical, with lossph 8.254320887119167 and lossmh 8.254326704528044
WARNING: Relative error 8.117365268798438E-4 > 1.0E-4 & <= 0.01 with 0.3720728122335215 analytical vs 0.37146925198072717 numerical, with lossph 8.254327521604598 and lossmh 8.254320092219558
WARNING: Relative error 8.117996956316842E-4 > 1.0E-4 & <= 0.01 with 0.14788594799412863 analytical vs 0.1476460352201059 numerical, with lossph 8.254325267869831 and lossmh 8.254322314949126
WARNING: Relative error 8.119012708962542E-4 > 1.0E-4 & <= 0.01 with -0.07795973031927872 analytical vs -0.07783324180721252 numerical, with lossph 8.254323015177858 and lossmh 8.254324571842695
WARNING: Relative error 8.117353519435853E-4 > 1.0E-4 & <= 0.01 with 0.48348549268368723 analytical vs 0.48270120478477446 numerical, with lossph 8.254328637254696 and lossmh 8.2543189832306
WARNING: Relative error 6.785553268852095E-4 > 1.0E-4 & <= 0.01 with 0.4883649684844016 analytical vs 0.48902818381435503 numerical, with lossph 8.254328686121752 and lossmh 8.254318905558076
WARNING: Relative error 8.117788275473475E-4 > 1.0E-4 & <= 0.01 with 0.3804938800778617 analytical vs 0.37987662739880074 numerical, with lossph 8.254327583249065 and lossmh 8.254319985716517
WARNING: Relative error 6.785547468313855E-4 > 1.0E-4 & <= 0.01 with 0.07631309322132188 analytical vs 0.07641672876701477 numerical, with lossph 8.254324560007202 and lossmh 8.254323031672627
WARNING: Relative error 6.785553971200803E-4 > 1.0E-4 & <= 0.01 with -0.342925836087173 analytical vs -0.34339154044715764 numerical, with lossph 8.254320361892585 and lossmh 8.254327229723394
WARNING: Relative error 6.785551565868137E-4 > 1.0E-4 & <= 0.01 with -0.02491798838300011 analytical vs -0.02495182780393179 numerical, with lossph 8.254323546295183 and lossmh 8.254324045331739
WARNING: Relative error 8.117758660292993E-4 > 1.0E-4 & <= 0.01 with -0.29017371666857017 analytical vs -0.2897029867554579 numerical, with lossph 8.254320890135642 and lossmh 8.254326684195377
WARNING: Relative error 6.785553229289772E-4 > 1.0E-4 & <= 0.01 with -0.5072646368526688 analytical vs -0.5079535185359418 numerical, with lossph 8.254318716279274 and lossmh 8.254328875349644
WARNING: Relative error 6.78554778315526E-4 > 1.0E-4 & <= 0.01 with 0.03126357863101518 analytical vs 0.031306035541689425 numerical, with lossph 8.254324108883962 and lossmh 8.254323482763251
WARNING: Relative error 6.785553099932536E-4 > 1.0E-4 & <= 0.01 with -0.2942630860026224 analytical vs -0.2946627047251127 numerical, with lossph 8.254320849187412 and lossmh 8.254326742441506
WARNING: Relative error 6.785551200008403E-4 > 1.0E-4 & <= 0.01 with 0.08135480331223598 analytical vs 0.08146528571728595 numerical, with lossph 8.254324610476463 and lossmh 8.254322981170748
 - Grad checking W.
WARNING: Relative error 1.2021043996600452E-4 > 1.0E-4 & <= 0.01 with -0.6822178752109117 analytical vs -0.6823819143519926 numerical, with lossph 8.254316974560194 and lossmh 8.25433062219848
WARNING: Relative error 3.0638425696629187E-4 > 1.0E-4 & <= 0.01 with -1.4508166976784973 analytical vs -1.4517059849339373 numerical, with lossph 8.254309268100457 and lossmh 8.254338302220155
WARNING: Relative error 8.117943113692949E-4 > 1.0E-4 & <= 0.01 with 0.7308501955876874 analytical vs 0.7296645580190385 numerical, with lossph 8.254331070748734 and lossmh 8.254316477457573
WARNING: Relative error 8.117065202066276E-4 > 1.0E-4 & <= 0.01 with 0.9946502709906734 analytical vs 0.9930368523924925 numerical, with lossph 8.254333755980372 and lossmh 8.254313895243325
WARNING: Relative error 3.063898473026446E-4 > 1.0E-4 & <= 0.01 with -1.9666799496457117 analytical vs -1.967885460540941 numerical, with lossph 8.254304102377066 and lossmh 8.254343460086277
WARNING: Relative error 3.063873162245538E-4 > 1.0E-4 & <= 0.01 with -1.4461219772325682 analytical vs -1.4470083956830135 numerical, with lossph 8.254309315051604 and lossmh 8.254338255219517
WARNING: Relative error 1.2020330860452122E-4 > 1.0E-4 & <= 0.01 with -1.2286387562580745 analytical vs -1.228934164654305 numerical, with lossph 8.254311502036288 and lossmh 8.25433608071958
WARNING: Relative error 6.785553661541439E-4 > 1.0E-4 & <= 0.01 with 0.08818615423020833 analytical vs 0.08830591387010144 numerical, with lossph 8.254324678957262 and lossmh 8.254322912838985
 - Grad checking b.
Grad checking the (inverted) dropout layer with L2 loss.
Grad checking the LSTM layer with L2 loss.
 - Grad checking X.
 - Grad checking W.
 - Grad checking b.
 - Grad checking out0.
 - Grad checking c0.
Grad checking the 2D max pooling layer with L2 loss.
 - Grad checking w/ pad=0.
 - Grad checking w/ pad=1.
Grad checking the built-in 2D max pooling layer with L2 loss.
 - Grad checking w/ pad=0.
 - Grad checking w/ pad=1.
Grad checking the simple reference 2D max pooling layer with L2 loss.
 - Grad checking w/ pad=0.
 - Grad checking w/ pad=1.
Grad checking the ReLU nonlinearity layer with L2 loss.
Grad checking the simple RNN layer with L2 loss.
 - Grad checking X.
 - Grad checking W.
 - Grad checking b.
 - Grad checking out0.
Grad checking the 1D scale & shift layer with L2 loss.
 - Grad checking X.
 - Grad checking gamma.
 - Grad checking beta.
Grad checking the 2D scale & shift layer with L2 loss.
 - Grad checking X.
 - Grad checking gamma.
 - Grad checking beta.
Grad checking the sigmoid nonlinearity layer with L2 loss.
Grad checking the softmax layer with L2 loss.
Grad checking the tanh nonlinearity layer with L2 loss.


---
Grad checks complete -- look for any ERRORs or WARNINGs.
If any tests involving ReLUs failed, try a few times to ensure that they were not false negatives due to kinks being crossed.


Starting other tests.
---
Testing the 1D batch normalization function.
Testing the 2D (spatial) batch normalization function.
Testing the 2D convolution functions.
ERROR: Relative error 1.4275242179409038E-8 > 1.0E-10 with 0.2613148690102816 vs 0.26131486154961564.
ERROR: Relative error 5.19998536442815E-10 > 1.0E-10 with -1.0332757339265042 vs -1.033275735001108.
ERROR: Relative error 1.6690477019584457E-9 > 1.0E-10 with -0.2796655022366367 vs -0.2796655013030866.
ERROR: Relative error 4.026622598319469E-8 > 1.0E-10 with -0.5810464384666573 vs -0.5810463916735648.
ERROR: Relative error 1.5925147093041632E-8 > 1.0E-10 with 0.3443370985342769 vs 0.34433710950151497.
ERROR: Relative error 1.527092464737425E-8 > 1.0E-10 with 0.29907123360418936 vs 0.29907122447000095.
ERROR: Relative error 1.6981187364236183E-8 > 1.0E-10 with -1.3172908215225392 vs -1.3172908662608644.
ERROR: Relative error 1.1249863733341583E-9 > 1.0E-10 with 2.372276123264216 vs 2.3722761179266594.
ERROR: Relative error 2.0761911589432542E-8 > 1.0E-10 with 1.2022032376875469 vs 1.2022031877674733.
ERROR: Relative error 1.565994385459594E-8 > 1.0E-10 with -3.2028287262088755 vs -3.202828826521113.
ERROR: Relative error 1.6864676187009944E-8 > 1.0E-10 with -1.1617410552745209 vs -1.1617410160897481.
ERROR: Relative error 6.7621761573523795E-9 > 1.0E-10 with -2.357698874691257 vs -2.3576989065776073.
ERROR: Relative error 8.077058791206047E-9 > 1.0E-10 with -0.826672150112067 vs -0.8266721634662262.
ERROR: Relative error 3.126862662452838E-7 > 1.0E-10 with 0.13533522445012097 vs 0.13533530908507949.
ERROR: Relative error 1.4938685572516403E-8 > 1.0E-10 with 1.6979405686913527 vs 1.697940619421354.
ERROR: Relative error 1.5016745260435074E-9 > 1.0E-10 with -1.1265715161920746 vs -1.1265715128085871.
ERROR: Relative error 1.57421411011751E-8 > 1.0E-10 with -1.4288550242468203 vs -1.4288549792603462.
ERROR: Relative error 5.967223900831169E-9 > 1.0E-10 with 3.3608783218897167 vs 3.360878361999944.
ERROR: Relative error 2.6680716508589268E-8 > 1.0E-10 with -1.017766223123622 vs -1.0177662774330876.
ERROR: Relative error 2.982949129497961E-8 > 1.0E-10 with -0.36147647765070573 vs -0.36147649921602526.
ERROR: Relative error 3.657840826605735E-8 > 1.0E-10 with 0.45134768685895427 vs 0.4513477198781154.
ERROR: Relative error 1.4907379969827675E-8 > 1.0E-10 with -1.5573016894495448 vs -1.5573017358801216.
ERROR: Relative error 1.5786845592838222E-8 > 1.0E-10 with 0.14299706440454166 vs 0.14299706891948688.
ERROR: Relative error 5.200703379619866E-10 > 1.0E-10 with 1.9947644445978157 vs 1.99476444252298.
ERROR: Relative error 6.52328960308163E-9 > 1.0E-10 with -0.9699901535166611 vs -0.9699901661717145.
ERROR: Relative error 1.1700565471480131E-8 > 1.0E-10 with 0.6822547438763443 vs 0.6822547598418771.
ERROR: Relative error 1.7446876565463836E-10 > 1.0E-10 with 1.0078137191134946 vs 1.0078137194651586.
ERROR: Relative error 1.1440123355883945E-8 > 1.0E-10 with 0.2932107414338165 vs 0.29321074814255066.
ERROR: Relative error 3.136300157201714E-8 > 1.0E-10 with -0.12095057939678663 vs -0.1209505869835333.
ERROR: Relative error 6.045675521257672E-9 > 1.0E-10 with -1.6284907233105383 vs -1.6284907036198855.
ERROR: Relative error 1.9960135534777884E-7 > 1.0E-10 with 0.05796215971397543 vs 0.05796218285263132.
ERROR: Relative error 1.4599148103004017E-8 > 1.0E-10 with -1.5695035918656892 vs -1.5695036376925207.
ERROR: Relative error 6.543527908682189E-9 > 1.0E-10 with -1.0879498840683728 vs -1.0879498983064337.
ERROR: Relative error 2.863818344889719E-8 > 1.0E-10 with 0.986026344823579 vs 0.9860264012995873.
ERROR: Relative error 1.1056654568999266E-8 > 1.0E-10 with 0.24707866293721786 vs 0.24707866840094478.
ERROR: Relative error 6.455137924885195E-8 > 1.0E-10 with -0.40055946263778364 vs -0.4005594109244554.
ERROR: Relative error 1.639718543077429E-8 > 1.0E-10 with -2.1781136614834082 vs -2.178113590053542.
ERROR: Relative error 4.8798430636206827E-8 > 1.0E-10 with 0.05076196924826959 vs 0.0507619742024787.
ERROR: Relative error 9.165908471957055E-9 > 1.0E-10 with 1.8794282546494758 vs 1.8794282201961414.
ERROR: Relative error 2.978816706402979E-8 > 1.0E-10 with 0.3975604976144726 vs 0.39756052129967034.
ERROR: Relative error 2.963621786230762E-8 > 1.0E-10 with 0.6415698417229411 vs 0.6415698797503494.
ERROR: Relative error 2.3109709555514415E-8 > 1.0E-10 with -1.1982869673393794 vs -1.1982870227235083.
ERROR: Relative error 3.2115462610775645E-10 > 1.0E-10 with -2.0857041042357167 vs -2.0857041055753838.
ERROR: Relative error 9.948630979193247E-8 > 1.0E-10 with -0.10622623825604993 vs -0.10622625939216493.
ERROR: Relative error 6.778314445963566E-8 > 1.0E-10 with 0.2196231226007214 vs 0.21962309282723172.
ERROR: Relative error 2.5809976150965856E-9 > 1.0E-10 with 2.588105950679918 vs 2.5881059640397086.
ERROR: Relative error 9.087554361084105E-8 > 1.0E-10 with 0.09544286378739902 vs 0.09544284644055634.
ERROR: Relative error 1.8858614351612048E-8 > 1.0E-10 with -0.5589644431314431 vs -0.5589644220488538.
ERROR: Relative error 7.736763250891607E-8 > 1.0E-10 with 0.04561424549587588 vs 0.04561425255400879.
ERROR: Relative error 2.4394587006452004E-8 > 1.0E-10 with 0.8068575646006609 vs 0.806857603966576.
ERROR: Relative error 1.2147537773849554E-8 > 1.0E-10 with -1.2798666770117564 vs -1.2798666459172992.
ERROR: Relative error 4.395188501268004E-9 > 1.0E-10 with 0.77094736022815 vs 0.7709473534512321.
ERROR: Relative error 4.273504230193166E-8 > 1.0E-10 with -0.04529170878696899 vs -0.04529171265805534.
ERROR: Relative error 5.193072042217285E-9 > 1.0E-10 with 0.26683412238923976 vs 0.26683411961786213.
ERROR: Relative error 5.044924487623104E-9 > 1.0E-10 with -0.743458732252303 vs -0.7434587247509167.
ERROR: Relative error 1.7148401756786993E-9 > 1.0E-10 with 0.8849261094606061 vs 0.8849261064255924.
ERROR: Relative error 5.823286681596099E-10 > 1.0E-10 with -2.926710388969381 vs -2.9267103855607663.
ERROR: Relative error 4.8325979184277686E-9 > 1.0E-10 with 1.8593052545877329 vs 1.8593052725582824.
ERROR: Relative error 6.601979216797429E-8 > 1.0E-10 with 0.5235158749686739 vs 0.523515944093497.
ERROR: Relative error 3.434129104490014E-8 > 1.0E-10 with -1.062474946283946 vs -1.0624750192574712.
ERROR: Relative error 1.8295696384999767E-8 > 1.0E-10 with -0.2937149434414278 vs -0.29371493269398913.
ERROR: Relative error 1.0063200936207417E-9 > 1.0E-10 with 1.119269021693839 vs 1.1192690239465248.
ERROR: Relative error 4.809906625420993E-9 > 1.0E-10 with 1.5742089463824267 vs 1.5742089615260229.
ERROR: Relative error 9.943604666704722E-9 > 1.0E-10 with 0.8775827136936241 vs 0.8775827311462954.
ERROR: Relative error 6.3483134114491796E-9 > 1.0E-10 with -0.698009708220039 vs -0.6980096993576703.
ERROR: Relative error 4.535157418604792E-8 > 1.0E-10 with -0.1749102977141979 vs -0.17491028184928392.
ERROR: Relative error 2.2360276379519686E-9 > 1.0E-10 with -0.8108452879638077 vs -0.8108452843376628.
ERROR: Relative error 1.0374420129323381E-8 > 1.0E-10 with -0.8834414065663451 vs -0.8834413882359606.
ERROR: Relative error 1.1878019489316737E-9 > 1.0E-10 with 2.532993506117655 vs 2.5329935121350444.
ERROR: Relative error 2.70754303169682E-7 > 1.0E-10 with 0.0717457323097893 vs 0.07174577116073133.
ERROR: Relative error 9.53853266322461E-10 > 1.0E-10 with -0.6740828968349293 vs -0.6740828955489769.
ERROR: Relative error 5.7021836168127075E-9 > 1.0E-10 with -1.2861467707448162 vs -1.2861467854125064.
ERROR: Relative error 7.72738725803068E-9 > 1.0E-10 with 0.6463938706474603 vs 0.6463938806373319.
ERROR: Relative error 1.6274098482424904E-8 > 1.0E-10 with 0.3306564558987715 vs 0.33065646666104315.
ERROR: Relative error 3.103682986912765E-8 > 1.0E-10 with -0.6970667742796319 vs -0.697066817549119.
ERROR: Relative error 1.535686164274709E-8 > 1.0E-10 with 0.701665021104026 vs 0.701664999553281.
ERROR: Relative error 4.100056878916496E-9 > 1.0E-10 with -2.1103616009334836 vs -2.110361618238689.
ERROR: Relative error 8.123120975873666E-9 > 1.0E-10 with 1.2604178829093742 vs 1.2604179033864282.
ERROR: Relative error 1.0941209833039406E-8 > 1.0E-10 with 1.2266289837112507 vs 1.2266290105528612.
ERROR: Relative error 5.541442492773271E-7 > 1.0E-10 with -0.012777216573308391 vs -0.012777202412474067.
ERROR: Relative error 4.2253762431563665E-9 > 1.0E-10 with -1.339045610669553 vs -1.339045621985496.
ERROR: Relative error 8.360566732946797E-8 > 1.0E-10 with 0.1561570755476005 vs 0.1561571016588357.
ERROR: Relative error 2.767267007771319E-9 > 1.0E-10 with -2.6167992977337984 vs -2.616799312216563.
ERROR: Relative error 1.7289233542875815E-8 > 1.0E-10 with 1.0579964137565983 vs 1.057996450340493.
ERROR: Relative error 1.7154762078446898E-8 > 1.0E-10 with -0.5334469910041575 vs -0.5334469727018454.
ERROR: Relative error 3.110409084496228E-9 > 1.0E-10 with 1.7670386817474821 vs 1.7670386707550558.
ERROR: Relative error 1.1173482982155575E-8 > 1.0E-10 with -1.1253843105032257 vs -1.125384335652151.
ERROR: Relative error 9.285717485307979E-9 > 1.0E-10 with -1.1188404636483114 vs -1.1188404428698386.
ERROR: Relative error 2.1555251644095785E-9 > 1.0E-10 with 1.1254590576532526 vs 1.1254590625051633.
ERROR: Relative error 2.319234552374325E-8 > 1.0E-10 with 0.9556977590880147 vs 0.955697803417761.
ERROR: Relative error 4.4763326839679915E-9 > 1.0E-10 with 1.591009326384972 vs 1.5910093406287462.
ERROR: Relative error 9.487915016929957E-9 > 1.0E-10 with 0.5884221509707697 vs 0.588422139804971.
ERROR: Relative error 1.7764401531913022E-8 > 1.0E-10 with 0.916673465359319 vs 0.9166734979276306.
ERROR: Relative error 2.3563316249488676E-8 > 1.0E-10 with -0.13562099666658628 vs -0.13562099027522556.
ERROR: Relative error 4.394744814951474E-9 > 1.0E-10 with 2.0302516382886835 vs 2.0302516561335593.
ERROR: Relative error 6.460001556392016E-10 > 1.0E-10 with 1.3949844117223749 vs 1.3949844099200546.
ERROR: Relative error 8.778931914804406E-9 > 1.0E-10 with -1.2093561879421602 vs -1.2093562091758716.
ERROR: Relative error 1.3404298361036346E-9 > 1.0E-10 with -1.7649810877732333 vs -1.7649810830415666.
ERROR: Relative error 8.428228195211393E-9 > 1.0E-10 with 2.497793466520835 vs 2.497793508624782.
ERROR: Relative error 1.0284037808289523E-8 > 1.0E-10 with 0.8600743385970706 vs 0.8600743209069968.
Testing the 2D convolution transpose function.
Testing the cross-entropy loss function with zero-valued predictions.
Testing the im2col and col2im functions.
Testing the 2D max pooling functions.
 - Testing w/ padh=0 & padw=0.
 - Testing w/ padh=0 & padw=1.
 - Testing w/ padh=0 & padw=2.
 - Testing w/ padh=0 & padw=3.
 - Testing w/ padh=1 & padw=0.
 - Testing w/ padh=1 & padw=1.
 - Testing w/ padh=1 & padw=2.
 - Testing w/ padh=1 & padw=3.
 - Testing w/ padh=2 & padw=0.
 - Testing w/ padh=2 & padw=1.
 - Testing w/ padh=2 & padw=2.
 - Testing w/ padh=2 & padw=3.
 - Testing w/ padh=3 & padw=0.
 - Testing w/ padh=3 & padw=1.
 - Testing w/ padh=3 & padw=2.
 - Testing w/ padh=3 & padw=3.
 - Testing for correct behavior against known answer w/ pad=0.
 - Testing for correct behavior against known answer w/ pad=1.
 - Testing for correct behavior against known answer w/ all negative matrix w/ pad=0.
 - Testing for correct behavior against known answer w/ all negative matrix w/ pad=1.
Testing the padding and unpadding functions.
Testing the tanh forward function.
---
Other tests complete -- look for any ERRORs or WARNINGs.


17/05/30 17:26:25 INFO api.DMLScript: END DML run 05/30/2017 17:26:25
SystemML Statistics:
Total elapsed time:		126.751 sec.
Total compilation time:		2.136 sec.
Total execution time:		124.615 sec.
Number of compiled MR Jobs:	0.
Number of executed MR Jobs:	0.
CUDA/CuLibraries init time:	1.086/0.985 sec.
Number of executed GPU inst:	552273.
GPU mem tx time  (alloc/dealloc/set0/toDev/fromDev):	0.032/0.002/6.738/29.418/16.843 sec.
GPU mem tx count (alloc/dealloc/set0/toDev/fromDev/evict):	221/221/972795/532/402390/237544/0.
GPU conversion time  (sparseConv/sp2dense/dense2sp):	0.001/0.037/0.000 sec.
GPU conversion count (sparseConv/sp2dense/dense2sp):	532/561/0.
Cache hits (Mem, WB, FS, HDFS):	2296853/0/0/0.
Cache writes (WB, FS, HDFS):	23912/0/0.
Cache times (ACQr/m, RLS, EXP):	18.229/0.952/0.666/0.000 sec.
HOP DAGs recompiled (PRED, SB):	0/0.
HOP DAGs recompile time:	0.053 sec.
Functions recompiled:		6501.
Functions recompile time:	12.265 sec.
ParFor loops optimized:		1235.
ParFor optimize time:		2.541 sec.
ParFor initialize time:		0.092 sec.
ParFor result merge time:	0.003 sec.
ParFor total update in-place:	0/288348/367740
Total JIT compile time:		39.733 sec.
Total JVM GC count:		75.
Total JVM GC time:		0.248 sec.
LibMatrixDNN dense count (conv/bwdF/bwdD/im2col/maxBwd):	0/0/0/0/0.
LibMatrixDNN sparse count (conv/bwdF/bwdD/im2col/maxBwd):	0/0/0/0/0.
LibMatrixDNN conv(im2col/matmult), bwdF (im2col/matmult), bwdD (col2im/matmult) time:	0.000/0.000/0.000/0.000/0.000/0.000 sec.
Heavy hitter instructions:
   #  Instruction           Time(s)    Count  GPU
   1  forward               111.231     7027  
   2  lstm                   56.371        1  
   3  gpu_*                  34.008   208803  s2d[0.000s,2], mmck[5.535s,75808], msk[9.984s,132995], ao[1.938s,208803], H2D[15.177s,209727]
   4  conv2d_simple          19.747        1  
   5  rnn                    14.719        1  
   6  max_pool2d             12.343        2  
   7  gpu_ba+*               12.131    71493  H2D[7.586s,99457], Mdmdm[0.623s,43613], ao[0.721s,71493], Mddot[2.555s,27880]
   8  leftIndex              10.275   367740  
   9  conv2d                  7.951        2  
  10  gpu_+                   7.352    98170  s2d[0.035s,542], msk[0.196s,2787], ddgeaml[0.319s,24089], D2D[0.383s,39732], ao[0.975s,98170], H2D[2.294s,29303], mmck[2.367s,31562]
  11  gpu_-                   6.977    80792  mmck[0.153s,2067], ddgeaml[0.179s,13471], msk[4.702s,65254], H2D[0.482s,6076], ao[0.896s,80792]
  12  sigmoid                 5.261    88549  
  13  backward                4.320       44  
  14  gpu_uamax               4.054    15628  r[0.002s,15628], az[0.093s,15628], D2H[1.046s,15628], rallk[1.313s,15628], H2D[1.469s,15628]
  15  max_pool2d_simple       4.010        1  
  16  rmvar                   3.585  5694332  
  17  gpu_r'                  3.118    25395  ao[0.255s,25394], ddgeaml[0.303s,25394], H2D[2.411s,25395]
  18  batch_norm1d            2.906        2  
  19  rangeReIndex            2.716   507414  
  20  gpu_uarsqk+             2.296    12662  a[0.000s,1], r[0.003s,12661], az[0.058s,12662], ao[0.139s,12662], rrowk[0.883s,12662], msk[1.052s,12662]
  21  gpu_uak+                2.164    13551  a[0.000s,1], s2d[0.001s,9], H2D[0.004s,18], r[0.005s,13550], az[0.105s,13551], D2H[0.946s,13551], rallk[0.966s,13551]
  22  rshape                  2.057   345346  
  23  affine                  1.425        1  
  24  batch_norm2d            1.350        2  
  25  gpu_+*                  1.229     8949  daxpymv[0.000s,2], s2d[0.000s,2], D2D[0.078s,8947], daxpy[0.088s,8947], ao[0.088s,8949], H2D[0.907s,12024]
  26  createvar               0.922  1945781  
  27  scale_shift1d           0.773        1  
  28  rand                    0.759    67235  
  29  dropout                 0.546        1  
  30  gpu_uacvar              0.493     1312  r[0.000s,3929], a[0.001s,7], ao[0.014s,1312], az[0.019s,3936], mmck[0.091s,1312], rcolk[0.170s,2624], msk[0.173s,2624]
  31  gpu_/                   0.304     3325  H2D[0.000s,1], ao[0.037s,3325], msk[0.096s,1252], mmck[0.147s,2073]
  32  scale_shift2d           0.242        1  
  33  *                       0.238   978652  
  34  conv2d_builtin          0.232        1  
  35  gpu_sqrt                0.216     2626  ao[0.028s,2626], sqrtk[0.168s,2626]
  36  max_pool2d_builtin      0.204        1  
  37  gpu_bias_add            0.202     1700  s2d[0.000s,4], ao[0.019s,1700], H2D[0.021s,323], nnrbk[0.126s,1700]
  38  +                       0.190   753849  
  39  gpu_uacmean             0.177     1312  ao[0.015s,1312], H2D[0.060s,906], rcolk[0.092s,1312]
  40  check_rel_grad_error    0.177     6001  
  41  gpu_bias_multiply       0.175     1553  ao[0.017s,1553], H2D[0.025s,319], nnrbk[0.120s,1553]
  42  col2im_t259             0.165        2  
  43  ncol                    0.152   404715  
  44  gpu_uacmax              0.149     1144  ao[0.012s,1144], rcolk[0.119s,1144]
  45  -                       0.147   539042  
  46  cpvar                   0.138   563847  
  47  append                  0.127    40461  
  48  gpu_uark+               0.123      920  H2D[0.004s,15], ao[0.014s,920], rrowk[0.091s,920]
  49  im2col                  0.120        2  
  50  cross_entropy_loss      0.119        2  
  51  check_rel_error         0.118    18454  
  52  gpu_uarvar              0.115      310  r[0.000s,925], a[0.001s,5], ao[0.003s,310], az[0.004s,930], mmck[0.020s,310], msk[0.040s,620], rrowk[0.040s,620]
  53  conv2d_transpose        0.105        2  
  54  gpu_^2                  0.104      814  H2D[0.002s,30], ao[0.008s,814], msk[0.086s,814]
  55  gpu_uarmean             0.095      620  ao[0.006s,620], rrowk[0.041s,620], H2D[0.044s,620]
  56  col2im                  0.084        1  
  57  nrow                    0.081   162537  
  58  gpu_conv2d_bias_add     0.081      278  s2d[0.000s,2], nnc[0.000s,278], nni[0.003s,278], ao[0.003s,278], nncf[0.009s,278], H2D[0.020s,281], nnrbk[0.034s,278]
  59  tanh                    0.065        2  
  60  log_loss                0.064        1  
  61  softmax                 0.062        1  
  62  gpu_maxpooling          0.052      278  nnc[0.000s,278], nni[0.002s,278], ao[0.004s,278], nnmf[0.004s,278], H2D[0.037s,260]
  63  im2col_t26284           0.051        3  
  64  im2col_t26282           0.045        3  
  65  im2col_t228             0.043        2  
  66  im2col_t259             0.038        2  
  67  im2col_t25918           0.037        3  
  68  ==                      0.036     9393  
  69  im2col_t25920           0.035        3  
  70  im2col_t26193           0.035        3  
  71  castdts                 0.034    55103  
  72  im2col_t26191           0.034        3  
  73  relu                    0.033        1  
  74  im2col_t342             0.032        2  
  75  im2col_t26102           0.031        3  
  76  im2col_t26100           0.031        3  
  77  im2col_t25827           0.030        3  
  78  im2col_t25829           0.030        3  
  79  im2col_t26011           0.030        3  
  80  im2col_t434             0.030        2  
  81  im2col_t25192           0.029        3  
  82  im2col_t25554           0.029        3  
  83  im2col_t457             0.029        2  
  84  assignvar               0.029   144209  
  85  im2col_t17008           0.029        2  
  86  l2_reg                  0.029        1  
  87  im2col_t595             0.029        2  
  88  im2col_t25465           0.028        3  
  89  im2col_t4804            0.028        2  
  90  im2col_t388             0.028        2  
  91  im2col_t25556           0.027        3  
  92  im2col_t24923           0.027        3  
  93  im2col_t25190           0.027        3  
  94  l1_reg                  0.027        1  
  95  im2col_t25738           0.027        3  
  96  im2col_t25463           0.027        3  
  97  im2col_t25736           0.026        3  
  98  im2col_t26009           0.026        3  
  99  im2col_t365             0.026        2  
 100  im2col_t411             0.026        2  
{code}

Ping [~mwdusenb@us.ibm.com], [~niketanpansare]


  was:
When running GPU tests (mike's run_tests.dml in the nn directory)

{code}
/usr/lib/jvm/java-8-oracle/bin/java -Xmx8g -Xms4g -Xmn1g -Dlog4j.configuration=file:/home/njindal/git/incubator-systemml/conf/log4j.properties -Duser.dir=/home/njindal/git/incubator-systemml/temp -javaagent:/home/njindal/idea-IC-171.4249.39/lib/idea_rt.jar=44905:/home/njindal/idea-IC-171.4249.39/bin -Dfile.encoding=UTF-8 -classpath /usr/lib/jvm/java-8-oracle/jre/lib/charsets.jar:/usr/lib/jvm/java-8-oracle/jre/lib/deploy.jar:/usr/lib/jvm/java-8-oracle/jre/lib/ext/cldrdata.jar:/usr/lib/jvm/java-8-oracle/jre/lib/ext/dnsns.jar:/usr/lib/jvm/java-8-oracle/jre/lib/ext/jaccess.jar:/usr/lib/jvm/java-8-oracle/jre/lib/ext/jfxrt.jar:/usr/lib/jvm/java-8-oracle/jre/lib/ext/localedata.jar:/usr/lib/jvm/java-8-oracle/jre/lib/ext/nashorn.jar:/usr/lib/jvm/java-8-oracle/jre/lib/ext/sunec.jar:/usr/lib/jvm/java-8-oracle/jre/lib/ext/sunjce_provider.jar:/usr/lib/jvm/java-8-oracle/jre/lib/ext/sunpkcs11.jar:/usr/lib/jvm/java-8-oracle/jre/lib/ext/zipfs.jar:/usr/lib/jvm/java-8-oracle/jre/lib/javaws.jar:/usr/lib/jvm/java-8-oracle/jre/lib/jce.jar:/usr/lib/jvm/java-8-oracle/jre/lib/jfr.jar:/usr/lib/jvm/java-8-oracle/jre/lib/jfxswt.jar:/usr/lib/jvm/java-8-oracle/jre/lib/jsse.jar:/usr/lib/jvm/java-8-oracle/jre/lib/management-agent.jar:/usr/lib/jvm/java-8-oracle/jre/lib/plugin.jar:/usr/lib/jvm/java-8-oracle/jre/lib/resources.jar:/usr/lib/jvm/java-8-oracle/jre/lib/rt.jar:/home/njindal/git/incubator-systemml/target/classes:/home/njindal/.m2/repository/com/google/protobuf/protobuf-java/3.2.0/protobuf-java-3.2.0.jar:/home/njindal/.m2/repository/org/jcuda/jcuda/0.8.0/jcuda-0.8.0.jar:/home/njindal/.m2/repository/org/jcuda/jcublas/0.8.0/jcublas-0.8.0.jar:/home/njindal/.m2/repository/org/jcuda/jcufft/0.8.0/jcufft-0.8.0.jar:/home/njindal/.m2/repository/org/jcuda/jcusparse/0.8.0/jcusparse-0.8.0.jar:/home/njindal/.m2/repository/org/jcuda/jcusolver/0.8.0/jcusolver-0.8.0.jar:/home/njindal/.m2/repository/org/jcuda/jcurand/0.8.0/jcurand-0.8.0.jar:/home/njindal/.m2/repository/org/jcuda/jnvgraph/0.8.0/jnvgraph-0.8.0.jar:/home/njindal/.m2/repository/org/jcuda/jcudnn/0.8.0/jcudnn-0.8.0.jar:/home/njindal/.m2/repository/org/jcuda/jcuda-natives/0.8.0/jcuda-natives-0.8.0-linux-x86_64.jar:/home/njindal/.m2/repository/org/jcuda/jcublas-natives/0.8.0/jcublas-natives-0.8.0-linux-x86_64.jar:/home/njindal/.m2/repository/org/jcuda/jcufft-natives/0.8.0/jcufft-natives-0.8.0-linux-x86_64.jar:/home/njindal/.m2/repository/org/jcuda/jcusparse-natives/0.8.0/jcusparse-natives-0.8.0-linux-x86_64.jar:/home/njindal/.m2/repository/org/jcuda/jcusolver-natives/0.8.0/jcusolver-natives-0.8.0-linux-x86_64.jar:/home/njindal/.m2/repository/org/jcuda/jcurand-natives/0.8.0/jcurand-natives-0.8.0-linux-x86_64.jar:/home/njindal/.m2/repository/org/jcuda/jnvgraph-natives/0.8.0/jnvgraph-natives-0.8.0-linux-x86_64.jar:/home/njindal/.m2/repository/org/jcuda/jcudnn-natives/0.8.0/jcudnn-natives-0.8.0-linux-x86_64.jar:/home/njindal/.m2/repository/org/apache/spark/spark-mllib_2.11/2.1.0/spark-mllib_2.11-2.1.0.jar:/home/njindal/.m2/repository/org/apache/spark/spark-core_2.11/2.1.0/spark-core_2.11-2.1.0.jar:/home/njindal/.m2/repository/org/apache/avro/avro-mapred/1.7.7/avro-mapred-1.7.7-hadoop2.jar:/home/njindal/.m2/repository/org/apache/avro/avro-ipc/1.7.7/avro-ipc-1.7.7.jar:/home/njindal/.m2/repository/org/apache/avro/avro-ipc/1.7.7/avro-ipc-1.7.7-tests.jar:/home/njindal/.m2/repository/com/twitter/chill_2.11/0.8.0/chill_2.11-0.8.0.jar:/home/njindal/.m2/repository/com/esotericsoftware/kryo-shaded/3.0.3/kryo-shaded-3.0.3.jar:/home/njindal/.m2/repository/com/esotericsoftware/minlog/1.3.0/minlog-1.3.0.jar:/home/njindal/.m2/repository/com/twitter/chill-java/0.8.0/chill-java-0.8.0.jar:/home/njindal/.m2/repository/org/apache/xbean/xbean-asm5-shaded/4.4/xbean-asm5-shaded-4.4.jar:/home/njindal/.m2/repository/org/apache/spark/spark-launcher_2.11/2.1.0/spark-launcher_2.11-2.1.0.jar:/home/njindal/.m2/repository/org/apache/spark/spark-network-common_2.11/2.1.0/spark-network-common_2.11-2.1.0.jar:/home/njindal/.m2/repository/com/fasterxml/jackson/core/jackson-annotations/2.6.5/jackson-annotations-2.6.5.jar:/home/njindal/.m2/repository/org/apache/spark/spark-network-shuffle_2.11/2.1.0/spark-network-shuffle_2.11-2.1.0.jar:/home/njindal/.m2/repository/org/apache/spark/spark-unsafe_2.11/2.1.0/spark-unsafe_2.11-2.1.0.jar:/home/njindal/.m2/repository/javax/servlet/javax.servlet-api/3.1.0/javax.servlet-api-3.1.0.jar:/home/njindal/.m2/repository/org/apache/commons/commons-lang3/3.5/commons-lang3-3.5.jar:/home/njindal/.m2/repository/org/slf4j/jul-to-slf4j/1.7.16/jul-to-slf4j-1.7.16.jar:/home/njindal/.m2/repository/org/slf4j/jcl-over-slf4j/1.7.16/jcl-over-slf4j-1.7.16.jar:/home/njindal/.m2/repository/com/ning/compress-lzf/1.0.3/compress-lzf-1.0.3.jar:/home/njindal/.m2/repository/org/xerial/snappy/snappy-java/1.1.2.6/snappy-java-1.1.2.6.jar:/home/njindal/.m2/repository/net/jpountz/lz4/lz4/1.3.0/lz4-1.3.0.jar:/home/njindal/.m2/repository/org/roaringbitmap/RoaringBitmap/0.5.11/RoaringBitmap-0.5.11.jar:/home/njindal/.m2/repository/org/json4s/json4s-jackson_2.11/3.2.11/json4s-jackson_2.11-3.2.11.jar:/home/njindal/.m2/repository/org/json4s/json4s-core_2.11/3.2.11/json4s-core_2.11-3.2.11.jar:/home/njindal/.m2/repository/org/json4s/json4s-ast_2.11/3.2.11/json4s-ast_2.11-3.2.11.jar:/home/njindal/.m2/repository/org/glassfish/jersey/core/jersey-client/2.22.2/jersey-client-2.22.2.jar:/home/njindal/.m2/repository/javax/ws/rs/javax.ws.rs-api/2.0.1/javax.ws.rs-api-2.0.1.jar:/home/njindal/.m2/repository/org/glassfish/hk2/hk2-api/2.4.0-b34/hk2-api-2.4.0-b34.jar:/home/njindal/.m2/repository/org/glassfish/hk2/hk2-utils/2.4.0-b34/hk2-utils-2.4.0-b34.jar:/home/njindal/.m2/repository/org/glassfish/hk2/external/aopalliance-repackaged/2.4.0-b34/aopalliance-repackaged-2.4.0-b34.jar:/home/njindal/.m2/repository/org/glassfish/hk2/external/javax.inject/2.4.0-b34/javax.inject-2.4.0-b34.jar:/home/njindal/.m2/repository/org/glassfish/hk2/hk2-locator/2.4.0-b34/hk2-locator-2.4.0-b34.jar:/home/njindal/.m2/repository/org/javassist/javassist/3.18.1-GA/javassist-3.18.1-GA.jar:/home/njindal/.m2/repository/org/glassfish/jersey/core/jersey-common/2.22.2/jersey-common-2.22.2.jar:/home/njindal/.m2/repository/javax/annotation/javax.annotation-api/1.2/javax.annotation-api-1.2.jar:/home/njindal/.m2/repository/org/glassfish/jersey/bundles/repackaged/jersey-guava/2.22.2/jersey-guava-2.22.2.jar:/home/njindal/.m2/repository/org/glassfish/hk2/osgi-resource-locator/1.0.1/osgi-resource-locator-1.0.1.jar:/home/njindal/.m2/repository/org/glassfish/jersey/core/jersey-server/2.22.2/jersey-server-2.22.2.jar:/home/njindal/.m2/repository/org/glassfish/jersey/media/jersey-media-jaxb/2.22.2/jersey-media-jaxb-2.22.2.jar:/home/njindal/.m2/repository/javax/validation/validation-api/1.1.0.Final/validation-api-1.1.0.Final.jar:/home/njindal/.m2/repository/org/glassfish/jersey/containers/jersey-container-servlet/2.22.2/jersey-container-servlet-2.22.2.jar:/home/njindal/.m2/repository/org/glassfish/jersey/containers/jersey-container-servlet-core/2.22.2/jersey-container-servlet-core-2.22.2.jar:/home/njindal/.m2/repository/io/netty/netty-all/4.0.42.Final/netty-all-4.0.42.Final.jar:/home/njindal/.m2/repository/com/clearspring/analytics/stream/2.7.0/stream-2.7.0.jar:/home/njindal/.m2/repository/io/dropwizard/metrics/metrics-core/3.1.2/metrics-core-3.1.2.jar:/home/njindal/.m2/repository/io/dropwizard/metrics/metrics-jvm/3.1.2/metrics-jvm-3.1.2.jar:/home/njindal/.m2/repository/io/dropwizard/metrics/metrics-json/3.1.2/metrics-json-3.1.2.jar:/home/njindal/.m2/repository/io/dropwizard/metrics/metrics-graphite/3.1.2/metrics-graphite-3.1.2.jar:/home/njindal/.m2/repository/com/fasterxml/jackson/core/jackson-databind/2.6.5/jackson-databind-2.6.5.jar:/home/njindal/.m2/repository/com/fasterxml/jackson/core/jackson-core/2.6.5/jackson-core-2.6.5.jar:/home/njindal/.m2/repository/com/fasterxml/jackson/module/jackson-module-scala_2.11/2.6.5/jackson-module-scala_2.11-2.6.5.jar:/home/njindal/.m2/repository/com/fasterxml/jackson/module/jackson-module-paranamer/2.6.5/jackson-module-paranamer-2.6.5.jar:/home/njindal/.m2/repository/org/apache/ivy/ivy/2.4.0/ivy-2.4.0.jar:/home/njindal/.m2/repository/oro/oro/2.0.8/oro-2.0.8.jar:/home/njindal/.m2/repository/net/razorvine/pyrolite/4.13/pyrolite-4.13.jar:/home/njindal/.m2/repository/net/sf/py4j/py4j/0.10.4/py4j-0.10.4.jar:/home/njindal/.m2/repository/org/apache/commons/commons-crypto/1.0.0/commons-crypto-1.0.0.jar:/home/njindal/.m2/repository/org/apache/spark/spark-streaming_2.11/2.1.0/spark-streaming_2.11-2.1.0.jar:/home/njindal/.m2/repository/org/apache/spark/spark-sql_2.11/2.1.0/spark-sql_2.11-2.1.0.jar:/home/njindal/.m2/repository/com/univocity/univocity-parsers/2.2.1/univocity-parsers-2.2.1.jar:/home/njindal/.m2/repository/org/apache/spark/spark-sketch_2.11/2.1.0/spark-sketch_2.11-2.1.0.jar:/home/njindal/.m2/repository/org/apache/spark/spark-catalyst_2.11/2.1.0/spark-catalyst_2.11-2.1.0.jar:/home/njindal/.m2/repository/org/apache/parquet/parquet-column/1.8.1/parquet-column-1.8.1.jar:/home/njindal/.m2/repository/org/apache/parquet/parquet-common/1.8.1/parquet-common-1.8.1.jar:/home/njindal/.m2/repository/org/apache/parquet/parquet-encoding/1.8.1/parquet-encoding-1.8.1.jar:/home/njindal/.m2/repository/org/apache/parquet/parquet-hadoop/1.8.1/parquet-hadoop-1.8.1.jar:/home/njindal/.m2/repository/org/apache/parquet/parquet-format/2.3.0-incubating/parquet-format-2.3.0-incubating.jar:/home/njindal/.m2/repository/org/apache/parquet/parquet-jackson/1.8.1/parquet-jackson-1.8.1.jar:/home/njindal/.m2/repository/org/apache/spark/spark-graphx_2.11/2.1.0/spark-graphx_2.11-2.1.0.jar:/home/njindal/.m2/repository/com/github/fommil/netlib/core/1.1.2/core-1.1.2.jar:/home/njindal/.m2/repository/net/sourceforge/f2j/arpack_combined_all/0.1/arpack_combined_all-0.1.jar:/home/njindal/.m2/repository/org/apache/spark/spark-mllib-local_2.11/2.1.0/spark-mllib-local_2.11-2.1.0.jar:/home/njindal/.m2/repository/org/scalanlp/breeze_2.11/0.12/breeze_2.11-0.12.jar:/home/njindal/.m2/repository/org/scalanlp/breeze-macros_2.11/0.12/breeze-macros_2.11-0.12.jar:/home/njindal/.m2/repository/net/sf/opencsv/opencsv/2.3/opencsv-2.3.jar:/home/njindal/.m2/repository/com/github/rwl/jtransforms/2.4.0/jtransforms-2.4.0.jar:/home/njindal/.m2/repository/org/spire-math/spire_2.11/0.7.4/spire_2.11-0.7.4.jar:/home/njindal/.m2/repository/org/spire-math/spire-macros_2.11/0.7.4/spire-macros_2.11-0.7.4.jar:/home/njindal/.m2/repository/com/chuusai/shapeless_2.11/2.0.0/shapeless_2.11-2.0.0.jar:/home/njindal/.m2/repository/org/jpmml/pmml-model/1.2.15/pmml-model-1.2.15.jar:/home/njindal/.m2/repository/org/jpmml/pmml-schema/1.2.15/pmml-schema-1.2.15.jar:/home/njindal/.m2/repository/org/apache/spark/spark-tags_2.11/2.1.0/spark-tags_2.11-2.1.0.jar:/home/njindal/.m2/repository/org/spark-project/spark/unused/1.0.0/unused-1.0.0.jar:/home/njindal/.m2/repository/org/apache/hadoop/hadoop-common/2.6.0/hadoop-common-2.6.0.jar:/home/njindal/.m2/repository/org/apache/hadoop/hadoop-annotations/2.6.0/hadoop-annotations-2.6.0.jar:/usr/lib/jvm/java-8-oracle/lib/tools.jar:/home/njindal/.m2/repository/com/google/guava/guava/11.0.2/guava-11.0.2.jar:/home/njindal/.m2/repository/commons-cli/commons-cli/1.2/commons-cli-1.2.jar:/home/njindal/.m2/repository/xmlenc/xmlenc/0.52/xmlenc-0.52.jar:/home/njindal/.m2/repository/commons-httpclient/commons-httpclient/3.1/commons-httpclient-3.1.jar:/home/njindal/.m2/repository/commons-codec/commons-codec/1.4/commons-codec-1.4.jar:/home/njindal/.m2/repository/commons-io/commons-io/2.4/commons-io-2.4.jar:/home/njindal/.m2/repository/commons-net/commons-net/3.1/commons-net-3.1.jar:/home/njindal/.m2/repository/org/mortbay/jetty/jetty/6.1.26/jetty-6.1.26.jar:/home/njindal/.m2/repository/org/mortbay/jetty/jetty-util/6.1.26/jetty-util-6.1.26.jar:/home/njindal/.m2/repository/com/sun/jersey/jersey-core/1.9/jersey-core-1.9.jar:/home/njindal/.m2/repository/com/sun/jersey/jersey-json/1.9/jersey-json-1.9.jar:/home/njindal/.m2/repository/org/codehaus/jettison/jettison/1.1/jettison-1.1.jar:/home/njindal/.m2/repository/com/sun/xml/bind/jaxb-impl/2.2.3-1/jaxb-impl-2.2.3-1.jar:/home/njindal/.m2/repository/com/sun/jersey/jersey-server/1.9/jersey-server-1.9.jar:/home/njindal/.m2/repository/asm/asm/3.1/asm-3.1.jar:/home/njindal/.m2/repository/tomcat/jasper-compiler/5.5.23/jasper-compiler-5.5.23.jar:/home/njindal/.m2/repository/tomcat/jasper-runtime/5.5.23/jasper-runtime-5.5.23.jar:/home/njindal/.m2/repository/javax/servlet/jsp/jsp-api/2.1/jsp-api-2.1.jar:/home/njindal/.m2/repository/commons-el/commons-el/1.0/commons-el-1.0.jar:/home/njindal/.m2/repository/net/java/dev/jets3t/jets3t/0.9.0/jets3t-0.9.0.jar:/home/njindal/.m2/repository/org/apache/httpcomponents/httpclient/4.1.2/httpclient-4.1.2.jar:/home/njindal/.m2/repository/org/apache/httpcomponents/httpcore/4.1.2/httpcore-4.1.2.jar:/home/njindal/.m2/repository/com/jamesmurty/utils/java-xmlbuilder/0.4/java-xmlbuilder-0.4.jar:/home/njindal/.m2/repository/commons-lang/commons-lang/2.6/commons-lang-2.6.jar:/home/njindal/.m2/repository/commons-configuration/commons-configuration/1.6/commons-configuration-1.6.jar:/home/njindal/.m2/repository/commons-digester/commons-digester/1.8/commons-digester-1.8.jar:/home/njindal/.m2/repository/commons-beanutils/commons-beanutils/1.7.0/commons-beanutils-1.7.0.jar:/home/njindal/.m2/repository/commons-beanutils/commons-beanutils-core/1.8.0/commons-beanutils-core-1.8.0.jar:/home/njindal/.m2/repository/org/slf4j/slf4j-api/1.7.5/slf4j-api-1.7.5.jar:/home/njindal/.m2/repository/org/slf4j/slf4j-log4j12/1.7.5/slf4j-log4j12-1.7.5.jar:/home/njindal/.m2/repository/org/codehaus/jackson/jackson-core-asl/1.9.13/jackson-core-asl-1.9.13.jar:/home/njindal/.m2/repository/org/codehaus/jackson/jackson-mapper-asl/1.9.13/jackson-mapper-asl-1.9.13.jar:/home/njindal/.m2/repository/org/apache/avro/avro/1.7.4/avro-1.7.4.jar:/home/njindal/.m2/repository/com/thoughtworks/paranamer/paranamer/2.3/paranamer-2.3.jar:/home/njindal/.m2/repository/com/google/code/gson/gson/2.2.4/gson-2.2.4.jar:/home/njindal/.m2/repository/org/apache/hadoop/hadoop-auth/2.6.0/hadoop-auth-2.6.0.jar:/home/njindal/.m2/repository/org/apache/directory/server/apacheds-kerberos-codec/2.0.0-M15/apacheds-kerberos-codec-2.0.0-M15.jar:/home/njindal/.m2/repository/org/apache/directory/server/apacheds-i18n/2.0.0-M15/apacheds-i18n-2.0.0-M15.jar:/home/njindal/.m2/repository/org/apache/directory/api/api-asn1-api/1.0.0-M20/api-asn1-api-1.0.0-M20.jar:/home/njindal/.m2/repository/org/apache/directory/api/api-util/1.0.0-M20/api-util-1.0.0-M20.jar:/home/njindal/.m2/repository/org/apache/curator/curator-framework/2.6.0/curator-framework-2.6.0.jar:/home/njindal/.m2/repository/com/jcraft/jsch/0.1.42/jsch-0.1.42.jar:/home/njindal/.m2/repository/org/apache/curator/curator-client/2.6.0/curator-client-2.6.0.jar:/home/njindal/.m2/repository/org/apache/curator/curator-recipes/2.6.0/curator-recipes-2.6.0.jar:/home/njindal/.m2/repository/com/google/code/findbugs/jsr305/1.3.9/jsr305-1.3.9.jar:/home/njindal/.m2/repository/org/htrace/htrace-core/3.0.4/htrace-core-3.0.4.jar:/home/njindal/.m2/repository/org/apache/zookeeper/zookeeper/3.4.6/zookeeper-3.4.6.jar:/home/njindal/.m2/repository/org/apache/commons/commons-compress/1.4.1/commons-compress-1.4.1.jar:/home/njindal/.m2/repository/org/tukaani/xz/1.0/xz-1.0.jar:/home/njindal/.m2/repository/org/apache/hadoop/hadoop-hdfs/2.6.0/hadoop-hdfs-2.6.0.jar:/home/njindal/.m2/repository/commons-daemon/commons-daemon/1.0.13/commons-daemon-1.0.13.jar:/home/njindal/.m2/repository/io/netty/netty/3.6.2.Final/netty-3.6.2.Final.jar:/home/njindal/.m2/repository/xerces/xercesImpl/2.9.1/xercesImpl-2.9.1.jar:/home/njindal/.m2/repository/xml-apis/xml-apis/1.3.04/xml-apis-1.3.04.jar:/home/njindal/.m2/repository/org/apache/hadoop/hadoop-client/2.6.0/hadoop-client-2.6.0.jar:/home/njindal/.m2/repository/org/apache/hadoop/hadoop-mapreduce-client-core/2.6.0/hadoop-mapreduce-client-core-2.6.0.jar:/home/njindal/.m2/repository/org/apache/hadoop/hadoop-mapreduce-client-app/2.6.0/hadoop-mapreduce-client-app-2.6.0.jar:/home/njindal/.m2/repository/org/apache/hadoop/hadoop-yarn-server-web-proxy/2.6.0/hadoop-yarn-server-web-proxy-2.6.0.jar:/home/njindal/.m2/repository/org/apache/hadoop/hadoop-mapreduce-client-shuffle/2.6.0/hadoop-mapreduce-client-shuffle-2.6.0.jar:/home/njindal/.m2/repository/org/apache/hadoop/hadoop-yarn-server-nodemanager/2.6.0/hadoop-yarn-server-nodemanager-2.6.0.jar:/home/njindal/.m2/repository/org/fusesource/leveldbjni/leveldbjni-all/1.8/leveldbjni-all-1.8.jar:/home/njindal/.m2/repository/com/google/inject/extensions/guice-servlet/3.0/guice-servlet-3.0.jar:/home/njindal/.m2/repository/commons-logging/commons-logging/1.1.3/commons-logging-1.1.3.jar:/home/njindal/.m2/repository/org/apache/commons/commons-math3/3.4.1/commons-math3-3.4.1.jar:/home/njindal/.m2/repository/log4j/log4j/1.2.15/log4j-1.2.15.jar:/home/njindal/.m2/repository/javax/mail/mail/1.4/mail-1.4.jar:/home/njindal/.m2/repository/javax/activation/activation/1.1/activation-1.1.jar:/home/njindal/.m2/repository/org/apache/wink/wink-json4j/1.4/wink-json4j-1.4.jar:/home/njindal/.m2/repository/commons-collections/commons-collections/3.2.1/commons-collections-3.2.1.jar:/home/njindal/.m2/repository/org/codehaus/janino/commons-compiler/3.0.0/commons-compiler-3.0.0.jar:/home/njindal/.m2/repository/org/antlr/antlr4-runtime/4.5.3/antlr4-runtime-4.5.3.jar:/home/njindal/.m2/repository/org/apache/hadoop/hadoop-yarn-api/2.6.0/hadoop-yarn-api-2.6.0.jar:/home/njindal/.m2/repository/org/apache/hadoop/hadoop-yarn-common/2.6.0/hadoop-yarn-common-2.6.0.jar:/home/njindal/.m2/repository/javax/xml/bind/jaxb-api/2.2.2/jaxb-api-2.2.2.jar:/home/njindal/.m2/repository/javax/xml/stream/stax-api/1.0-2/stax-api-1.0-2.jar:/home/njindal/.m2/repository/javax/servlet/servlet-api/2.5/servlet-api-2.5.jar:/home/njindal/.m2/repository/com/sun/jersey/jersey-client/1.9/jersey-client-1.9.jar:/home/njindal/.m2/repository/org/codehaus/jackson/jackson-jaxrs/1.9.13/jackson-jaxrs-1.9.13.jar:/home/njindal/.m2/repository/org/codehaus/jackson/jackson-xc/1.9.13/jackson-xc-1.9.13.jar:/home/njindal/.m2/repository/com/google/inject/guice/3.0/guice-3.0.jar:/home/njindal/.m2/repository/javax/inject/javax.inject/1/javax.inject-1.jar:/home/njindal/.m2/repository/aopalliance/aopalliance/1.0/aopalliance-1.0.jar:/home/njindal/.m2/repository/com/sun/jersey/contribs/jersey-guice/1.9/jersey-guice-1.9.jar:/home/njindal/.m2/repository/org/apache/hadoop/hadoop-yarn-client/2.6.0/hadoop-yarn-client-2.6.0.jar:/home/njindal/.m2/repository/org/apache/hadoop/hadoop-mapreduce-client-common/2.6.0/hadoop-mapreduce-client-common-2.6.0.jar:/home/njindal/.m2/repository/org/apache/hadoop/hadoop-yarn-server-common/2.6.0/hadoop-yarn-server-common-2.6.0.jar:/home/njindal/.m2/repository/org/scala-lang/scala-library/2.11.8/scala-library-2.11.8.jar:/home/njindal/.m2/repository/org/scala-lang/scala-reflect/2.11.7/scala-reflect-2.11.7.jar:/home/njindal/.m2/repository/org/objenesis/objenesis/1.0/objenesis-1.0.jar org.apache.sysml.api.DMLScript -f scripts/nn/test/run_tests.dml -exec singlenode -config /home/njindal/git/incubator-systemml/conf/SystemML-config.xml -gpu force -stats 100
17/05/30 17:24:19 INFO api.DMLScript: BEGIN DML run 05/30/2017 17:24:19
17/05/30 17:24:20 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/05/30 17:24:21 INFO context.GPUContext: Initializing CUDA

Starting grad checks.
---
17/05/30 17:24:22 INFO context.GPUContext:  GPU memory - Total: 2096.300032 MB, Available: 1295.9743999999998 MB on GPUContext{deviceNum=0}
17/05/30 17:24:22 INFO context.GPUContext: Total number of GPUs on the machine: 1
Grad checking the cross-entropy loss function.
Grad checking the L1 loss function.
Grad checking the L1 regularization function.
Grad checking the L2 loss function.
Grad checking the L2 regularization function.
Grad checking the log loss function.

Grad checking the affine layer with L2 loss.
 - Grad checking X.
 - Grad checking W.
 - Grad checking b.
Grad checking the 1D batch normalization layer with L2 loss.
 - Grad checking the 'train' mode.
   - Grad checking X.
   - Grad checking gamma.
   - Grad checking beta.
 - Grad checking the 'test' mode.
   - Grad checking X.
   - Grad checking gamma.
   - Grad checking beta.
Grad checking the 2D (spatial) batch normalization layer with L2 loss.
 - Grad checking the 'train' mode.
   - Grad checking X.
   - Grad checking gamma.
   - Grad checking beta.
 - Grad checking the 'test' mode.
   - Grad checking X.
   - Grad checking gamma.
   - Grad checking beta.
Grad checking the `im2col` 2D convolutional layer with L2 loss.
17/05/30 17:24:28 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
17/05/30 17:24:28 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
 - Grad checking X.
 - Grad checking W.
 - Grad checking b.
Grad checking the built-in 2D convolutional layer with L2 loss.
 - Grad checking X.
 - Grad checking W.
WARNING: Relative error 3.063931109511093E-4 > 1.0E-4 & <= 0.01 with -11.682479557456533 analytical vs -11.689640614065409 numerical, with lossph 40.510115394324195 and lossmh 40.51034918713648
WARNING: Relative error 6.785572589631694E-4 > 1.0E-4 & <= 0.01 with -14.363880156229683 analytical vs -14.383386822913733 numerical, with lossph 40.510088543924184 and lossmh 40.51037621166064
WARNING: Relative error 8.117464157218959E-4 > 1.0E-4 & <= 0.01 with -13.400658690617757 analytical vs -13.378920463225084 numerical, with lossph 40.51009898805432 and lossmh 40.51036656646358
WARNING: Relative error 6.785567321010216E-4 > 1.0E-4 & <= 0.01 with -14.37300870216048 analytical vs -14.39252775057298 numerical, with lossph 40.510088452456074 and lossmh 40.510376303011085
WARNING: Relative error 0.0023065358169588085 > 1.0E-4 & <= 0.01 with -15.081214796672182 analytical vs -15.011804170583785 numerical, with lossph 40.510081360786614 and lossmh 40.510381596870026
WARNING: Relative error 1.2020843619724922E-4 > 1.0E-4 & <= 0.01 with -14.602099111310885 analytical vs -14.60561012436301 numerical, with lossph 40.51008637609418 and lossmh 40.510378488296666
WARNING: Relative error 3.063921242335014E-4 > 1.0E-4 & <= 0.01 with -11.654549775926586 analytical vs -11.66169368929104 numerical, with lossph 40.51011567395115 and lossmh 40.510348907824934
 - Grad checking b.
Grad checking the simple reference 2D convolutional layer with L2 loss.
 - Grad checking X.
 - Grad checking W.
 - Grad checking b.
Grad checking the 2D convolution transpose layer with L2 loss.
 - Grad checking X.
WARNING: Relative error 6.785553488451468E-4 > 1.0E-4 & <= 0.01 with 0.2480096633484488 analytical vs 0.2483464684566172 numerical, with lossph 8.25432627928163 and lossmh 8.25432131235226
WARNING: Relative error 8.117342148227497E-4 > 1.0E-4 & <= 0.01 with 0.46178385247729725 analytical vs 0.4610347690281457 numerical, with lossph 8.254328419943578 and lossmh 8.254319199248197
WARNING: Relative error 6.78555370922306E-4 > 1.0E-4 & <= 0.01 with 0.5511303465906289 analytical vs 0.5518787993707974 numerical, with lossph 8.254329314621874 and lossmh 8.254318277045886
WARNING: Relative error 8.117020436730868E-4 > 1.0E-4 & <= 0.01 with 0.13829553169194606 analytical vs 0.13807120424758068 numerical, with lossph 8.254325180655963 and lossmh 8.254322419231878
WARNING: Relative error 8.117328203683862E-4 > 1.0E-4 & <= 0.01 with 0.5055309144436196 analytical vs 0.5047108680322765 numerical, with lossph 8.25432885801433 and lossmh 8.25431876379697
WARNING: Relative error 8.116455308945274E-4 > 1.0E-4 & <= 0.01 with 0.06899396037823916 analytical vs 0.06888205392741042 numerical, with lossph 8.254324486699112 and lossmh 8.254323109058033
WARNING: Relative error 6.785554871822532E-4 > 1.0E-4 & <= 0.01 with -0.13350593809809497 analytical vs -0.1336872434976044 numerical, with lossph 8.254322458935555 and lossmh 8.254325132680425
WARNING: Relative error 6.785552242935504E-4 > 1.0E-4 & <= 0.01 with -0.2724052650635402 analytical vs -0.2727752001163708 numerical, with lossph 8.254321068071604 and lossmh 8.254326523575607
WARNING: Relative error 6.785555175584701E-4 > 1.0E-4 & <= 0.01 with -0.2904759680044567 analytical vs -0.29087044381981286 numerical, with lossph 8.254320887119167 and lossmh 8.254326704528044
WARNING: Relative error 8.117365268798438E-4 > 1.0E-4 & <= 0.01 with 0.3720728122335215 analytical vs 0.37146925198072717 numerical, with lossph 8.254327521604598 and lossmh 8.254320092219558
WARNING: Relative error 8.117996956316842E-4 > 1.0E-4 & <= 0.01 with 0.14788594799412863 analytical vs 0.1476460352201059 numerical, with lossph 8.254325267869831 and lossmh 8.254322314949126
WARNING: Relative error 8.119012708962542E-4 > 1.0E-4 & <= 0.01 with -0.07795973031927872 analytical vs -0.07783324180721252 numerical, with lossph 8.254323015177858 and lossmh 8.254324571842695
WARNING: Relative error 8.117353519435853E-4 > 1.0E-4 & <= 0.01 with 0.48348549268368723 analytical vs 0.48270120478477446 numerical, with lossph 8.254328637254696 and lossmh 8.2543189832306
WARNING: Relative error 6.785553268852095E-4 > 1.0E-4 & <= 0.01 with 0.4883649684844016 analytical vs 0.48902818381435503 numerical, with lossph 8.254328686121752 and lossmh 8.254318905558076
WARNING: Relative error 8.117788275473475E-4 > 1.0E-4 & <= 0.01 with 0.3804938800778617 analytical vs 0.37987662739880074 numerical, with lossph 8.254327583249065 and lossmh 8.254319985716517
WARNING: Relative error 6.785547468313855E-4 > 1.0E-4 & <= 0.01 with 0.07631309322132188 analytical vs 0.07641672876701477 numerical, with lossph 8.254324560007202 and lossmh 8.254323031672627
WARNING: Relative error 6.785553971200803E-4 > 1.0E-4 & <= 0.01 with -0.342925836087173 analytical vs -0.34339154044715764 numerical, with lossph 8.254320361892585 and lossmh 8.254327229723394
WARNING: Relative error 6.785551565868137E-4 > 1.0E-4 & <= 0.01 with -0.02491798838300011 analytical vs -0.02495182780393179 numerical, with lossph 8.254323546295183 and lossmh 8.254324045331739
WARNING: Relative error 8.117758660292993E-4 > 1.0E-4 & <= 0.01 with -0.29017371666857017 analytical vs -0.2897029867554579 numerical, with lossph 8.254320890135642 and lossmh 8.254326684195377
WARNING: Relative error 6.785553229289772E-4 > 1.0E-4 & <= 0.01 with -0.5072646368526688 analytical vs -0.5079535185359418 numerical, with lossph 8.254318716279274 and lossmh 8.254328875349644
WARNING: Relative error 6.78554778315526E-4 > 1.0E-4 & <= 0.01 with 0.03126357863101518 analytical vs 0.031306035541689425 numerical, with lossph 8.254324108883962 and lossmh 8.254323482763251
WARNING: Relative error 6.785553099932536E-4 > 1.0E-4 & <= 0.01 with -0.2942630860026224 analytical vs -0.2946627047251127 numerical, with lossph 8.254320849187412 and lossmh 8.254326742441506
WARNING: Relative error 6.785551200008403E-4 > 1.0E-4 & <= 0.01 with 0.08135480331223598 analytical vs 0.08146528571728595 numerical, with lossph 8.254324610476463 and lossmh 8.254322981170748
 - Grad checking W.
WARNING: Relative error 1.2021043996600452E-4 > 1.0E-4 & <= 0.01 with -0.6822178752109117 analytical vs -0.6823819143519926 numerical, with lossph 8.254316974560194 and lossmh 8.25433062219848
WARNING: Relative error 3.0638425696629187E-4 > 1.0E-4 & <= 0.01 with -1.4508166976784973 analytical vs -1.4517059849339373 numerical, with lossph 8.254309268100457 and lossmh 8.254338302220155
WARNING: Relative error 8.117943113692949E-4 > 1.0E-4 & <= 0.01 with 0.7308501955876874 analytical vs 0.7296645580190385 numerical, with lossph 8.254331070748734 and lossmh 8.254316477457573
WARNING: Relative error 8.117065202066276E-4 > 1.0E-4 & <= 0.01 with 0.9946502709906734 analytical vs 0.9930368523924925 numerical, with lossph 8.254333755980372 and lossmh 8.254313895243325
WARNING: Relative error 3.063898473026446E-4 > 1.0E-4 & <= 0.01 with -1.9666799496457117 analytical vs -1.967885460540941 numerical, with lossph 8.254304102377066 and lossmh 8.254343460086277
WARNING: Relative error 3.063873162245538E-4 > 1.0E-4 & <= 0.01 with -1.4461219772325682 analytical vs -1.4470083956830135 numerical, with lossph 8.254309315051604 and lossmh 8.254338255219517
WARNING: Relative error 1.2020330860452122E-4 > 1.0E-4 & <= 0.01 with -1.2286387562580745 analytical vs -1.228934164654305 numerical, with lossph 8.254311502036288 and lossmh 8.25433608071958
WARNING: Relative error 6.785553661541439E-4 > 1.0E-4 & <= 0.01 with 0.08818615423020833 analytical vs 0.08830591387010144 numerical, with lossph 8.254324678957262 and lossmh 8.254322912838985
 - Grad checking b.
Grad checking the (inverted) dropout layer with L2 loss.
Grad checking the LSTM layer with L2 loss.
 - Grad checking X.
 - Grad checking W.
 - Grad checking b.
 - Grad checking out0.
 - Grad checking c0.
Grad checking the 2D max pooling layer with L2 loss.
 - Grad checking w/ pad=0.
 - Grad checking w/ pad=1.
Grad checking the built-in 2D max pooling layer with L2 loss.
 - Grad checking w/ pad=0.
 - Grad checking w/ pad=1.
Grad checking the simple reference 2D max pooling layer with L2 loss.
 - Grad checking w/ pad=0.
 - Grad checking w/ pad=1.
Grad checking the ReLU nonlinearity layer with L2 loss.
Grad checking the simple RNN layer with L2 loss.
 - Grad checking X.
 - Grad checking W.
 - Grad checking b.
 - Grad checking out0.
Grad checking the 1D scale & shift layer with L2 loss.
 - Grad checking X.
 - Grad checking gamma.
 - Grad checking beta.
Grad checking the 2D scale & shift layer with L2 loss.
 - Grad checking X.
 - Grad checking gamma.
 - Grad checking beta.
Grad checking the sigmoid nonlinearity layer with L2 loss.
Grad checking the softmax layer with L2 loss.
Grad checking the tanh nonlinearity layer with L2 loss.


---
Grad checks complete -- look for any ERRORs or WARNINGs.
If any tests involving ReLUs failed, try a few times to ensure that they were not false negatives due to kinks being crossed.


Starting other tests.
---
Testing the 1D batch normalization function.
Testing the 2D (spatial) batch normalization function.
Testing the 2D convolution functions.
ERROR: Relative error 1.4275242179409038E-8 > 1.0E-10 with 0.2613148690102816 vs 0.26131486154961564.
ERROR: Relative error 5.19998536442815E-10 > 1.0E-10 with -1.0332757339265042 vs -1.033275735001108.
ERROR: Relative error 1.6690477019584457E-9 > 1.0E-10 with -0.2796655022366367 vs -0.2796655013030866.
ERROR: Relative error 4.026622598319469E-8 > 1.0E-10 with -0.5810464384666573 vs -0.5810463916735648.
ERROR: Relative error 1.5925147093041632E-8 > 1.0E-10 with 0.3443370985342769 vs 0.34433710950151497.
ERROR: Relative error 1.527092464737425E-8 > 1.0E-10 with 0.29907123360418936 vs 0.29907122447000095.
ERROR: Relative error 1.6981187364236183E-8 > 1.0E-10 with -1.3172908215225392 vs -1.3172908662608644.
ERROR: Relative error 1.1249863733341583E-9 > 1.0E-10 with 2.372276123264216 vs 2.3722761179266594.
ERROR: Relative error 2.0761911589432542E-8 > 1.0E-10 with 1.2022032376875469 vs 1.2022031877674733.
ERROR: Relative error 1.565994385459594E-8 > 1.0E-10 with -3.2028287262088755 vs -3.202828826521113.
ERROR: Relative error 1.6864676187009944E-8 > 1.0E-10 with -1.1617410552745209 vs -1.1617410160897481.
ERROR: Relative error 6.7621761573523795E-9 > 1.0E-10 with -2.357698874691257 vs -2.3576989065776073.
ERROR: Relative error 8.077058791206047E-9 > 1.0E-10 with -0.826672150112067 vs -0.8266721634662262.
ERROR: Relative error 3.126862662452838E-7 > 1.0E-10 with 0.13533522445012097 vs 0.13533530908507949.
ERROR: Relative error 1.4938685572516403E-8 > 1.0E-10 with 1.6979405686913527 vs 1.697940619421354.
ERROR: Relative error 1.5016745260435074E-9 > 1.0E-10 with -1.1265715161920746 vs -1.1265715128085871.
ERROR: Relative error 1.57421411011751E-8 > 1.0E-10 with -1.4288550242468203 vs -1.4288549792603462.
ERROR: Relative error 5.967223900831169E-9 > 1.0E-10 with 3.3608783218897167 vs 3.360878361999944.
ERROR: Relative error 2.6680716508589268E-8 > 1.0E-10 with -1.017766223123622 vs -1.0177662774330876.
ERROR: Relative error 2.982949129497961E-8 > 1.0E-10 with -0.36147647765070573 vs -0.36147649921602526.
ERROR: Relative error 3.657840826605735E-8 > 1.0E-10 with 0.45134768685895427 vs 0.4513477198781154.
ERROR: Relative error 1.4907379969827675E-8 > 1.0E-10 with -1.5573016894495448 vs -1.5573017358801216.
ERROR: Relative error 1.5786845592838222E-8 > 1.0E-10 with 0.14299706440454166 vs 0.14299706891948688.
ERROR: Relative error 5.200703379619866E-10 > 1.0E-10 with 1.9947644445978157 vs 1.99476444252298.
ERROR: Relative error 6.52328960308163E-9 > 1.0E-10 with -0.9699901535166611 vs -0.9699901661717145.
ERROR: Relative error 1.1700565471480131E-8 > 1.0E-10 with 0.6822547438763443 vs 0.6822547598418771.
ERROR: Relative error 1.7446876565463836E-10 > 1.0E-10 with 1.0078137191134946 vs 1.0078137194651586.
ERROR: Relative error 1.1440123355883945E-8 > 1.0E-10 with 0.2932107414338165 vs 0.29321074814255066.
ERROR: Relative error 3.136300157201714E-8 > 1.0E-10 with -0.12095057939678663 vs -0.1209505869835333.
ERROR: Relative error 6.045675521257672E-9 > 1.0E-10 with -1.6284907233105383 vs -1.6284907036198855.
ERROR: Relative error 1.9960135534777884E-7 > 1.0E-10 with 0.05796215971397543 vs 0.05796218285263132.
ERROR: Relative error 1.4599148103004017E-8 > 1.0E-10 with -1.5695035918656892 vs -1.5695036376925207.
ERROR: Relative error 6.543527908682189E-9 > 1.0E-10 with -1.0879498840683728 vs -1.0879498983064337.
ERROR: Relative error 2.863818344889719E-8 > 1.0E-10 with 0.986026344823579 vs 0.9860264012995873.
ERROR: Relative error 1.1056654568999266E-8 > 1.0E-10 with 0.24707866293721786 vs 0.24707866840094478.
ERROR: Relative error 6.455137924885195E-8 > 1.0E-10 with -0.40055946263778364 vs -0.4005594109244554.
ERROR: Relative error 1.639718543077429E-8 > 1.0E-10 with -2.1781136614834082 vs -2.178113590053542.
ERROR: Relative error 4.8798430636206827E-8 > 1.0E-10 with 0.05076196924826959 vs 0.0507619742024787.
ERROR: Relative error 9.165908471957055E-9 > 1.0E-10 with 1.8794282546494758 vs 1.8794282201961414.
ERROR: Relative error 2.978816706402979E-8 > 1.0E-10 with 0.3975604976144726 vs 0.39756052129967034.
ERROR: Relative error 2.963621786230762E-8 > 1.0E-10 with 0.6415698417229411 vs 0.6415698797503494.
ERROR: Relative error 2.3109709555514415E-8 > 1.0E-10 with -1.1982869673393794 vs -1.1982870227235083.
ERROR: Relative error 3.2115462610775645E-10 > 1.0E-10 with -2.0857041042357167 vs -2.0857041055753838.
ERROR: Relative error 9.948630979193247E-8 > 1.0E-10 with -0.10622623825604993 vs -0.10622625939216493.
ERROR: Relative error 6.778314445963566E-8 > 1.0E-10 with 0.2196231226007214 vs 0.21962309282723172.
ERROR: Relative error 2.5809976150965856E-9 > 1.0E-10 with 2.588105950679918 vs 2.5881059640397086.
ERROR: Relative error 9.087554361084105E-8 > 1.0E-10 with 0.09544286378739902 vs 0.09544284644055634.
ERROR: Relative error 1.8858614351612048E-8 > 1.0E-10 with -0.5589644431314431 vs -0.5589644220488538.
ERROR: Relative error 7.736763250891607E-8 > 1.0E-10 with 0.04561424549587588 vs 0.04561425255400879.
ERROR: Relative error 2.4394587006452004E-8 > 1.0E-10 with 0.8068575646006609 vs 0.806857603966576.
ERROR: Relative error 1.2147537773849554E-8 > 1.0E-10 with -1.2798666770117564 vs -1.2798666459172992.
ERROR: Relative error 4.395188501268004E-9 > 1.0E-10 with 0.77094736022815 vs 0.7709473534512321.
ERROR: Relative error 4.273504230193166E-8 > 1.0E-10 with -0.04529170878696899 vs -0.04529171265805534.
ERROR: Relative error 5.193072042217285E-9 > 1.0E-10 with 0.26683412238923976 vs 0.26683411961786213.
ERROR: Relative error 5.044924487623104E-9 > 1.0E-10 with -0.743458732252303 vs -0.7434587247509167.
ERROR: Relative error 1.7148401756786993E-9 > 1.0E-10 with 0.8849261094606061 vs 0.8849261064255924.
ERROR: Relative error 5.823286681596099E-10 > 1.0E-10 with -2.926710388969381 vs -2.9267103855607663.
ERROR: Relative error 4.8325979184277686E-9 > 1.0E-10 with 1.8593052545877329 vs 1.8593052725582824.
ERROR: Relative error 6.601979216797429E-8 > 1.0E-10 with 0.5235158749686739 vs 0.523515944093497.
ERROR: Relative error 3.434129104490014E-8 > 1.0E-10 with -1.062474946283946 vs -1.0624750192574712.
ERROR: Relative error 1.8295696384999767E-8 > 1.0E-10 with -0.2937149434414278 vs -0.29371493269398913.
ERROR: Relative error 1.0063200936207417E-9 > 1.0E-10 with 1.119269021693839 vs 1.1192690239465248.
ERROR: Relative error 4.809906625420993E-9 > 1.0E-10 with 1.5742089463824267 vs 1.5742089615260229.
ERROR: Relative error 9.943604666704722E-9 > 1.0E-10 with 0.8775827136936241 vs 0.8775827311462954.
ERROR: Relative error 6.3483134114491796E-9 > 1.0E-10 with -0.698009708220039 vs -0.6980096993576703.
ERROR: Relative error 4.535157418604792E-8 > 1.0E-10 with -0.1749102977141979 vs -0.17491028184928392.
ERROR: Relative error 2.2360276379519686E-9 > 1.0E-10 with -0.8108452879638077 vs -0.8108452843376628.
ERROR: Relative error 1.0374420129323381E-8 > 1.0E-10 with -0.8834414065663451 vs -0.8834413882359606.
ERROR: Relative error 1.1878019489316737E-9 > 1.0E-10 with 2.532993506117655 vs 2.5329935121350444.
ERROR: Relative error 2.70754303169682E-7 > 1.0E-10 with 0.0717457323097893 vs 0.07174577116073133.
ERROR: Relative error 9.53853266322461E-10 > 1.0E-10 with -0.6740828968349293 vs -0.6740828955489769.
ERROR: Relative error 5.7021836168127075E-9 > 1.0E-10 with -1.2861467707448162 vs -1.2861467854125064.
ERROR: Relative error 7.72738725803068E-9 > 1.0E-10 with 0.6463938706474603 vs 0.6463938806373319.
ERROR: Relative error 1.6274098482424904E-8 > 1.0E-10 with 0.3306564558987715 vs 0.33065646666104315.
ERROR: Relative error 3.103682986912765E-8 > 1.0E-10 with -0.6970667742796319 vs -0.697066817549119.
ERROR: Relative error 1.535686164274709E-8 > 1.0E-10 with 0.701665021104026 vs 0.701664999553281.
ERROR: Relative error 4.100056878916496E-9 > 1.0E-10 with -2.1103616009334836 vs -2.110361618238689.
ERROR: Relative error 8.123120975873666E-9 > 1.0E-10 with 1.2604178829093742 vs 1.2604179033864282.
ERROR: Relative error 1.0941209833039406E-8 > 1.0E-10 with 1.2266289837112507 vs 1.2266290105528612.
ERROR: Relative error 5.541442492773271E-7 > 1.0E-10 with -0.012777216573308391 vs -0.012777202412474067.
ERROR: Relative error 4.2253762431563665E-9 > 1.0E-10 with -1.339045610669553 vs -1.339045621985496.
ERROR: Relative error 8.360566732946797E-8 > 1.0E-10 with 0.1561570755476005 vs 0.1561571016588357.
ERROR: Relative error 2.767267007771319E-9 > 1.0E-10 with -2.6167992977337984 vs -2.616799312216563.
ERROR: Relative error 1.7289233542875815E-8 > 1.0E-10 with 1.0579964137565983 vs 1.057996450340493.
ERROR: Relative error 1.7154762078446898E-8 > 1.0E-10 with -0.5334469910041575 vs -0.5334469727018454.
ERROR: Relative error 3.110409084496228E-9 > 1.0E-10 with 1.7670386817474821 vs 1.7670386707550558.
ERROR: Relative error 1.1173482982155575E-8 > 1.0E-10 with -1.1253843105032257 vs -1.125384335652151.
ERROR: Relative error 9.285717485307979E-9 > 1.0E-10 with -1.1188404636483114 vs -1.1188404428698386.
ERROR: Relative error 2.1555251644095785E-9 > 1.0E-10 with 1.1254590576532526 vs 1.1254590625051633.
ERROR: Relative error 2.319234552374325E-8 > 1.0E-10 with 0.9556977590880147 vs 0.955697803417761.
ERROR: Relative error 4.4763326839679915E-9 > 1.0E-10 with 1.591009326384972 vs 1.5910093406287462.
ERROR: Relative error 9.487915016929957E-9 > 1.0E-10 with 0.5884221509707697 vs 0.588422139804971.
ERROR: Relative error 1.7764401531913022E-8 > 1.0E-10 with 0.916673465359319 vs 0.9166734979276306.
ERROR: Relative error 2.3563316249488676E-8 > 1.0E-10 with -0.13562099666658628 vs -0.13562099027522556.
ERROR: Relative error 4.394744814951474E-9 > 1.0E-10 with 2.0302516382886835 vs 2.0302516561335593.
ERROR: Relative error 6.460001556392016E-10 > 1.0E-10 with 1.3949844117223749 vs 1.3949844099200546.
ERROR: Relative error 8.778931914804406E-9 > 1.0E-10 with -1.2093561879421602 vs -1.2093562091758716.
ERROR: Relative error 1.3404298361036346E-9 > 1.0E-10 with -1.7649810877732333 vs -1.7649810830415666.
ERROR: Relative error 8.428228195211393E-9 > 1.0E-10 with 2.497793466520835 vs 2.497793508624782.
ERROR: Relative error 1.0284037808289523E-8 > 1.0E-10 with 0.8600743385970706 vs 0.8600743209069968.
Testing the 2D convolution transpose function.
Testing the cross-entropy loss function with zero-valued predictions.
Testing the im2col and col2im functions.
Testing the 2D max pooling functions.
 - Testing w/ padh=0 & padw=0.
 - Testing w/ padh=0 & padw=1.
 - Testing w/ padh=0 & padw=2.
 - Testing w/ padh=0 & padw=3.
 - Testing w/ padh=1 & padw=0.
 - Testing w/ padh=1 & padw=1.
 - Testing w/ padh=1 & padw=2.
 - Testing w/ padh=1 & padw=3.
 - Testing w/ padh=2 & padw=0.
 - Testing w/ padh=2 & padw=1.
 - Testing w/ padh=2 & padw=2.
 - Testing w/ padh=2 & padw=3.
 - Testing w/ padh=3 & padw=0.
 - Testing w/ padh=3 & padw=1.
 - Testing w/ padh=3 & padw=2.
 - Testing w/ padh=3 & padw=3.
 - Testing for correct behavior against known answer w/ pad=0.
 - Testing for correct behavior against known answer w/ pad=1.
 - Testing for correct behavior against known answer w/ all negative matrix w/ pad=0.
 - Testing for correct behavior against known answer w/ all negative matrix w/ pad=1.
Testing the padding and unpadding functions.
Testing the tanh forward function.
---
Other tests complete -- look for any ERRORs or WARNINGs.


17/05/30 17:26:25 INFO api.DMLScript: END DML run 05/30/2017 17:26:25
SystemML Statistics:
Total elapsed time:		126.751 sec.
Total compilation time:		2.136 sec.
Total execution time:		124.615 sec.
Number of compiled MR Jobs:	0.
Number of executed MR Jobs:	0.
CUDA/CuLibraries init time:	1.086/0.985 sec.
Number of executed GPU inst:	552273.
GPU mem tx time  (alloc/dealloc/set0/toDev/fromDev):	0.032/0.002/6.738/29.418/16.843 sec.
GPU mem tx count (alloc/dealloc/set0/toDev/fromDev/evict):	221/221/972795/532/402390/237544/0.
GPU conversion time  (sparseConv/sp2dense/dense2sp):	0.001/0.037/0.000 sec.
GPU conversion count (sparseConv/sp2dense/dense2sp):	532/561/0.
Cache hits (Mem, WB, FS, HDFS):	2296853/0/0/0.
Cache writes (WB, FS, HDFS):	23912/0/0.
Cache times (ACQr/m, RLS, EXP):	18.229/0.952/0.666/0.000 sec.
HOP DAGs recompiled (PRED, SB):	0/0.
HOP DAGs recompile time:	0.053 sec.
Functions recompiled:		6501.
Functions recompile time:	12.265 sec.
ParFor loops optimized:		1235.
ParFor optimize time:		2.541 sec.
ParFor initialize time:		0.092 sec.
ParFor result merge time:	0.003 sec.
ParFor total update in-place:	0/288348/367740
Total JIT compile time:		39.733 sec.
Total JVM GC count:		75.
Total JVM GC time:		0.248 sec.
LibMatrixDNN dense count (conv/bwdF/bwdD/im2col/maxBwd):	0/0/0/0/0.
LibMatrixDNN sparse count (conv/bwdF/bwdD/im2col/maxBwd):	0/0/0/0/0.
LibMatrixDNN conv(im2col/matmult), bwdF (im2col/matmult), bwdD (col2im/matmult) time:	0.000/0.000/0.000/0.000/0.000/0.000 sec.
Heavy hitter instructions:
   #  Instruction           Time(s)    Count  GPU
   1  forward               111.231     7027  
   2  lstm                   56.371        1  
   3  gpu_*                  34.008   208803  s2d[0.000s,2], mmck[5.535s,75808], msk[9.984s,132995], ao[1.938s,208803], H2D[15.177s,209727]
   4  conv2d_simple          19.747        1  
   5  rnn                    14.719        1  
   6  max_pool2d             12.343        2  
   7  gpu_ba+*               12.131    71493  H2D[7.586s,99457], Mdmdm[0.623s,43613], ao[0.721s,71493], Mddot[2.555s,27880]
   8  leftIndex              10.275   367740  
   9  conv2d                  7.951        2  
  10  gpu_+                   7.352    98170  s2d[0.035s,542], msk[0.196s,2787], ddgeaml[0.319s,24089], D2D[0.383s,39732], ao[0.975s,98170], H2D[2.294s,29303], mmck[2.367s,31562]
  11  gpu_-                   6.977    80792  mmck[0.153s,2067], ddgeaml[0.179s,13471], msk[4.702s,65254], H2D[0.482s,6076], ao[0.896s,80792]
  12  sigmoid                 5.261    88549  
  13  backward                4.320       44  
  14  gpu_uamax               4.054    15628  r[0.002s,15628], az[0.093s,15628], D2H[1.046s,15628], rallk[1.313s,15628], H2D[1.469s,15628]
  15  max_pool2d_simple       4.010        1  
  16  rmvar                   3.585  5694332  
  17  gpu_r'                  3.118    25395  ao[0.255s,25394], ddgeaml[0.303s,25394], H2D[2.411s,25395]
  18  batch_norm1d            2.906        2  
  19  rangeReIndex            2.716   507414  
  20  gpu_uarsqk+             2.296    12662  a[0.000s,1], r[0.003s,12661], az[0.058s,12662], ao[0.139s,12662], rrowk[0.883s,12662], msk[1.052s,12662]
  21  gpu_uak+                2.164    13551  a[0.000s,1], s2d[0.001s,9], H2D[0.004s,18], r[0.005s,13550], az[0.105s,13551], D2H[0.946s,13551], rallk[0.966s,13551]
  22  rshape                  2.057   345346  
  23  affine                  1.425        1  
  24  batch_norm2d            1.350        2  
  25  gpu_+*                  1.229     8949  daxpymv[0.000s,2], s2d[0.000s,2], D2D[0.078s,8947], daxpy[0.088s,8947], ao[0.088s,8949], H2D[0.907s,12024]
  26  createvar               0.922  1945781  
  27  scale_shift1d           0.773        1  
  28  rand                    0.759    67235  
  29  dropout                 0.546        1  
  30  gpu_uacvar              0.493     1312  r[0.000s,3929], a[0.001s,7], ao[0.014s,1312], az[0.019s,3936], mmck[0.091s,1312], rcolk[0.170s,2624], msk[0.173s,2624]
  31  gpu_/                   0.304     3325  H2D[0.000s,1], ao[0.037s,3325], msk[0.096s,1252], mmck[0.147s,2073]
  32  scale_shift2d           0.242        1  
  33  *                       0.238   978652  
  34  conv2d_builtin          0.232        1  
  35  gpu_sqrt                0.216     2626  ao[0.028s,2626], sqrtk[0.168s,2626]
  36  max_pool2d_builtin      0.204        1  
  37  gpu_bias_add            0.202     1700  s2d[0.000s,4], ao[0.019s,1700], H2D[0.021s,323], nnrbk[0.126s,1700]
  38  +                       0.190   753849  
  39  gpu_uacmean             0.177     1312  ao[0.015s,1312], H2D[0.060s,906], rcolk[0.092s,1312]
  40  check_rel_grad_error    0.177     6001  
  41  gpu_bias_multiply       0.175     1553  ao[0.017s,1553], H2D[0.025s,319], nnrbk[0.120s,1553]
  42  col2im_t259             0.165        2  
  43  ncol                    0.152   404715  
  44  gpu_uacmax              0.149     1144  ao[0.012s,1144], rcolk[0.119s,1144]
  45  -                       0.147   539042  
  46  cpvar                   0.138   563847  
  47  append                  0.127    40461  
  48  gpu_uark+               0.123      920  H2D[0.004s,15], ao[0.014s,920], rrowk[0.091s,920]
  49  im2col                  0.120        2  
  50  cross_entropy_loss      0.119        2  
  51  check_rel_error         0.118    18454  
  52  gpu_uarvar              0.115      310  r[0.000s,925], a[0.001s,5], ao[0.003s,310], az[0.004s,930], mmck[0.020s,310], msk[0.040s,620], rrowk[0.040s,620]
  53  conv2d_transpose        0.105        2  
  54  gpu_^2                  0.104      814  H2D[0.002s,30], ao[0.008s,814], msk[0.086s,814]
  55  gpu_uarmean             0.095      620  ao[0.006s,620], rrowk[0.041s,620], H2D[0.044s,620]
  56  col2im                  0.084        1  
  57  nrow                    0.081   162537  
  58  gpu_conv2d_bias_add     0.081      278  s2d[0.000s,2], nnc[0.000s,278], nni[0.003s,278], ao[0.003s,278], nncf[0.009s,278], H2D[0.020s,281], nnrbk[0.034s,278]
  59  tanh                    0.065        2  
  60  log_loss                0.064        1  
  61  softmax                 0.062        1  
  62  gpu_maxpooling          0.052      278  nnc[0.000s,278], nni[0.002s,278], ao[0.004s,278], nnmf[0.004s,278], H2D[0.037s,260]
  63  im2col_t26284           0.051        3  
  64  im2col_t26282           0.045        3  
  65  im2col_t228             0.043        2  
  66  im2col_t259             0.038        2  
  67  im2col_t25918           0.037        3  
  68  ==                      0.036     9393  
  69  im2col_t25920           0.035        3  
  70  im2col_t26193           0.035        3  
  71  castdts                 0.034    55103  
  72  im2col_t26191           0.034        3  
  73  relu                    0.033        1  
  74  im2col_t342             0.032        2  
  75  im2col_t26102           0.031        3  
  76  im2col_t26100           0.031        3  
  77  im2col_t25827           0.030        3  
  78  im2col_t25829           0.030        3  
  79  im2col_t26011           0.030        3  
  80  im2col_t434             0.030        2  
  81  im2col_t25192           0.029        3  
  82  im2col_t25554           0.029        3  
  83  im2col_t457             0.029        2  
  84  assignvar               0.029   144209  
  85  im2col_t17008           0.029        2  
  86  l2_reg                  0.029        1  
  87  im2col_t595             0.029        2  
  88  im2col_t25465           0.028        3  
  89  im2col_t4804            0.028        2  
  90  im2col_t388             0.028        2  
  91  im2col_t25556           0.027        3  
  92  im2col_t24923           0.027        3  
  93  im2col_t25190           0.027        3  
  94  l1_reg                  0.027        1  
  95  im2col_t25738           0.027        3  
  96  im2col_t25463           0.027        3  
  97  im2col_t25736           0.026        3  
  98  im2col_t26009           0.026        3  
  99  im2col_t365             0.026        2  
 100  im2col_t411             0.026        2  
{code}



> GPU cudnn produces worrisome amount of numerical instability
> ------------------------------------------------------------
>
>                 Key: SYSTEMML-1650
>                 URL: https://issues.apache.org/jira/browse/SYSTEMML-1650
>             Project: SystemML
>          Issue Type: Bug
>          Components: Runtime
>    Affects Versions: SystemML 0.14
>            Reporter: Nakul Jindal
>             Fix For: SystemML 1.0
>
>
> When running GPU tests (mike's run_tests.dml in the nn directory)
> {code}
> 17/05/30 17:24:19 INFO api.DMLScript: BEGIN DML run 05/30/2017 17:24:19
> 17/05/30 17:24:20 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 17/05/30 17:24:21 INFO context.GPUContext: Initializing CUDA
> Starting grad checks.
> ---
> 17/05/30 17:24:22 INFO context.GPUContext:  GPU memory - Total: 2096.300032 MB, Available: 1295.9743999999998 MB on GPUContext{deviceNum=0}
> 17/05/30 17:24:22 INFO context.GPUContext: Total number of GPUs on the machine: 1
> Grad checking the cross-entropy loss function.
> Grad checking the L1 loss function.
> Grad checking the L1 regularization function.
> Grad checking the L2 loss function.
> Grad checking the L2 regularization function.
> Grad checking the log loss function.
> Grad checking the affine layer with L2 loss.
>  - Grad checking X.
>  - Grad checking W.
>  - Grad checking b.
> Grad checking the 1D batch normalization layer with L2 loss.
>  - Grad checking the 'train' mode.
>    - Grad checking X.
>    - Grad checking gamma.
>    - Grad checking beta.
>  - Grad checking the 'test' mode.
>    - Grad checking X.
>    - Grad checking gamma.
>    - Grad checking beta.
> Grad checking the 2D (spatial) batch normalization layer with L2 loss.
>  - Grad checking the 'train' mode.
>    - Grad checking X.
>    - Grad checking gamma.
>    - Grad checking beta.
>  - Grad checking the 'test' mode.
>    - Grad checking X.
>    - Grad checking gamma.
>    - Grad checking beta.
> Grad checking the `im2col` 2D convolutional layer with L2 loss.
> 17/05/30 17:24:28 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
> 17/05/30 17:24:28 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
>  - Grad checking X.
>  - Grad checking W.
>  - Grad checking b.
> Grad checking the built-in 2D convolutional layer with L2 loss.
>  - Grad checking X.
>  - Grad checking W.
> WARNING: Relative error 3.063931109511093E-4 > 1.0E-4 & <= 0.01 with -11.682479557456533 analytical vs -11.689640614065409 numerical, with lossph 40.510115394324195 and lossmh 40.51034918713648
> WARNING: Relative error 6.785572589631694E-4 > 1.0E-4 & <= 0.01 with -14.363880156229683 analytical vs -14.383386822913733 numerical, with lossph 40.510088543924184 and lossmh 40.51037621166064
> WARNING: Relative error 8.117464157218959E-4 > 1.0E-4 & <= 0.01 with -13.400658690617757 analytical vs -13.378920463225084 numerical, with lossph 40.51009898805432 and lossmh 40.51036656646358
> WARNING: Relative error 6.785567321010216E-4 > 1.0E-4 & <= 0.01 with -14.37300870216048 analytical vs -14.39252775057298 numerical, with lossph 40.510088452456074 and lossmh 40.510376303011085
> WARNING: Relative error 0.0023065358169588085 > 1.0E-4 & <= 0.01 with -15.081214796672182 analytical vs -15.011804170583785 numerical, with lossph 40.510081360786614 and lossmh 40.510381596870026
> WARNING: Relative error 1.2020843619724922E-4 > 1.0E-4 & <= 0.01 with -14.602099111310885 analytical vs -14.60561012436301 numerical, with lossph 40.51008637609418 and lossmh 40.510378488296666
> WARNING: Relative error 3.063921242335014E-4 > 1.0E-4 & <= 0.01 with -11.654549775926586 analytical vs -11.66169368929104 numerical, with lossph 40.51011567395115 and lossmh 40.510348907824934
>  - Grad checking b.
> Grad checking the simple reference 2D convolutional layer with L2 loss.
>  - Grad checking X.
>  - Grad checking W.
>  - Grad checking b.
> Grad checking the 2D convolution transpose layer with L2 loss.
>  - Grad checking X.
> WARNING: Relative error 6.785553488451468E-4 > 1.0E-4 & <= 0.01 with 0.2480096633484488 analytical vs 0.2483464684566172 numerical, with lossph 8.25432627928163 and lossmh 8.25432131235226
> WARNING: Relative error 8.117342148227497E-4 > 1.0E-4 & <= 0.01 with 0.46178385247729725 analytical vs 0.4610347690281457 numerical, with lossph 8.254328419943578 and lossmh 8.254319199248197
> WARNING: Relative error 6.78555370922306E-4 > 1.0E-4 & <= 0.01 with 0.5511303465906289 analytical vs 0.5518787993707974 numerical, with lossph 8.254329314621874 and lossmh 8.254318277045886
> WARNING: Relative error 8.117020436730868E-4 > 1.0E-4 & <= 0.01 with 0.13829553169194606 analytical vs 0.13807120424758068 numerical, with lossph 8.254325180655963 and lossmh 8.254322419231878
> WARNING: Relative error 8.117328203683862E-4 > 1.0E-4 & <= 0.01 with 0.5055309144436196 analytical vs 0.5047108680322765 numerical, with lossph 8.25432885801433 and lossmh 8.25431876379697
> WARNING: Relative error 8.116455308945274E-4 > 1.0E-4 & <= 0.01 with 0.06899396037823916 analytical vs 0.06888205392741042 numerical, with lossph 8.254324486699112 and lossmh 8.254323109058033
> WARNING: Relative error 6.785554871822532E-4 > 1.0E-4 & <= 0.01 with -0.13350593809809497 analytical vs -0.1336872434976044 numerical, with lossph 8.254322458935555 and lossmh 8.254325132680425
> WARNING: Relative error 6.785552242935504E-4 > 1.0E-4 & <= 0.01 with -0.2724052650635402 analytical vs -0.2727752001163708 numerical, with lossph 8.254321068071604 and lossmh 8.254326523575607
> WARNING: Relative error 6.785555175584701E-4 > 1.0E-4 & <= 0.01 with -0.2904759680044567 analytical vs -0.29087044381981286 numerical, with lossph 8.254320887119167 and lossmh 8.254326704528044
> WARNING: Relative error 8.117365268798438E-4 > 1.0E-4 & <= 0.01 with 0.3720728122335215 analytical vs 0.37146925198072717 numerical, with lossph 8.254327521604598 and lossmh 8.254320092219558
> WARNING: Relative error 8.117996956316842E-4 > 1.0E-4 & <= 0.01 with 0.14788594799412863 analytical vs 0.1476460352201059 numerical, with lossph 8.254325267869831 and lossmh 8.254322314949126
> WARNING: Relative error 8.119012708962542E-4 > 1.0E-4 & <= 0.01 with -0.07795973031927872 analytical vs -0.07783324180721252 numerical, with lossph 8.254323015177858 and lossmh 8.254324571842695
> WARNING: Relative error 8.117353519435853E-4 > 1.0E-4 & <= 0.01 with 0.48348549268368723 analytical vs 0.48270120478477446 numerical, with lossph 8.254328637254696 and lossmh 8.2543189832306
> WARNING: Relative error 6.785553268852095E-4 > 1.0E-4 & <= 0.01 with 0.4883649684844016 analytical vs 0.48902818381435503 numerical, with lossph 8.254328686121752 and lossmh 8.254318905558076
> WARNING: Relative error 8.117788275473475E-4 > 1.0E-4 & <= 0.01 with 0.3804938800778617 analytical vs 0.37987662739880074 numerical, with lossph 8.254327583249065 and lossmh 8.254319985716517
> WARNING: Relative error 6.785547468313855E-4 > 1.0E-4 & <= 0.01 with 0.07631309322132188 analytical vs 0.07641672876701477 numerical, with lossph 8.254324560007202 and lossmh 8.254323031672627
> WARNING: Relative error 6.785553971200803E-4 > 1.0E-4 & <= 0.01 with -0.342925836087173 analytical vs -0.34339154044715764 numerical, with lossph 8.254320361892585 and lossmh 8.254327229723394
> WARNING: Relative error 6.785551565868137E-4 > 1.0E-4 & <= 0.01 with -0.02491798838300011 analytical vs -0.02495182780393179 numerical, with lossph 8.254323546295183 and lossmh 8.254324045331739
> WARNING: Relative error 8.117758660292993E-4 > 1.0E-4 & <= 0.01 with -0.29017371666857017 analytical vs -0.2897029867554579 numerical, with lossph 8.254320890135642 and lossmh 8.254326684195377
> WARNING: Relative error 6.785553229289772E-4 > 1.0E-4 & <= 0.01 with -0.5072646368526688 analytical vs -0.5079535185359418 numerical, with lossph 8.254318716279274 and lossmh 8.254328875349644
> WARNING: Relative error 6.78554778315526E-4 > 1.0E-4 & <= 0.01 with 0.03126357863101518 analytical vs 0.031306035541689425 numerical, with lossph 8.254324108883962 and lossmh 8.254323482763251
> WARNING: Relative error 6.785553099932536E-4 > 1.0E-4 & <= 0.01 with -0.2942630860026224 analytical vs -0.2946627047251127 numerical, with lossph 8.254320849187412 and lossmh 8.254326742441506
> WARNING: Relative error 6.785551200008403E-4 > 1.0E-4 & <= 0.01 with 0.08135480331223598 analytical vs 0.08146528571728595 numerical, with lossph 8.254324610476463 and lossmh 8.254322981170748
>  - Grad checking W.
> WARNING: Relative error 1.2021043996600452E-4 > 1.0E-4 & <= 0.01 with -0.6822178752109117 analytical vs -0.6823819143519926 numerical, with lossph 8.254316974560194 and lossmh 8.25433062219848
> WARNING: Relative error 3.0638425696629187E-4 > 1.0E-4 & <= 0.01 with -1.4508166976784973 analytical vs -1.4517059849339373 numerical, with lossph 8.254309268100457 and lossmh 8.254338302220155
> WARNING: Relative error 8.117943113692949E-4 > 1.0E-4 & <= 0.01 with 0.7308501955876874 analytical vs 0.7296645580190385 numerical, with lossph 8.254331070748734 and lossmh 8.254316477457573
> WARNING: Relative error 8.117065202066276E-4 > 1.0E-4 & <= 0.01 with 0.9946502709906734 analytical vs 0.9930368523924925 numerical, with lossph 8.254333755980372 and lossmh 8.254313895243325
> WARNING: Relative error 3.063898473026446E-4 > 1.0E-4 & <= 0.01 with -1.9666799496457117 analytical vs -1.967885460540941 numerical, with lossph 8.254304102377066 and lossmh 8.254343460086277
> WARNING: Relative error 3.063873162245538E-4 > 1.0E-4 & <= 0.01 with -1.4461219772325682 analytical vs -1.4470083956830135 numerical, with lossph 8.254309315051604 and lossmh 8.254338255219517
> WARNING: Relative error 1.2020330860452122E-4 > 1.0E-4 & <= 0.01 with -1.2286387562580745 analytical vs -1.228934164654305 numerical, with lossph 8.254311502036288 and lossmh 8.25433608071958
> WARNING: Relative error 6.785553661541439E-4 > 1.0E-4 & <= 0.01 with 0.08818615423020833 analytical vs 0.08830591387010144 numerical, with lossph 8.254324678957262 and lossmh 8.254322912838985
>  - Grad checking b.
> Grad checking the (inverted) dropout layer with L2 loss.
> Grad checking the LSTM layer with L2 loss.
>  - Grad checking X.
>  - Grad checking W.
>  - Grad checking b.
>  - Grad checking out0.
>  - Grad checking c0.
> Grad checking the 2D max pooling layer with L2 loss.
>  - Grad checking w/ pad=0.
>  - Grad checking w/ pad=1.
> Grad checking the built-in 2D max pooling layer with L2 loss.
>  - Grad checking w/ pad=0.
>  - Grad checking w/ pad=1.
> Grad checking the simple reference 2D max pooling layer with L2 loss.
>  - Grad checking w/ pad=0.
>  - Grad checking w/ pad=1.
> Grad checking the ReLU nonlinearity layer with L2 loss.
> Grad checking the simple RNN layer with L2 loss.
>  - Grad checking X.
>  - Grad checking W.
>  - Grad checking b.
>  - Grad checking out0.
> Grad checking the 1D scale & shift layer with L2 loss.
>  - Grad checking X.
>  - Grad checking gamma.
>  - Grad checking beta.
> Grad checking the 2D scale & shift layer with L2 loss.
>  - Grad checking X.
>  - Grad checking gamma.
>  - Grad checking beta.
> Grad checking the sigmoid nonlinearity layer with L2 loss.
> Grad checking the softmax layer with L2 loss.
> Grad checking the tanh nonlinearity layer with L2 loss.
> ---
> Grad checks complete -- look for any ERRORs or WARNINGs.
> If any tests involving ReLUs failed, try a few times to ensure that they were not false negatives due to kinks being crossed.
> Starting other tests.
> ---
> Testing the 1D batch normalization function.
> Testing the 2D (spatial) batch normalization function.
> Testing the 2D convolution functions.
> ERROR: Relative error 1.4275242179409038E-8 > 1.0E-10 with 0.2613148690102816 vs 0.26131486154961564.
> ERROR: Relative error 5.19998536442815E-10 > 1.0E-10 with -1.0332757339265042 vs -1.033275735001108.
> ERROR: Relative error 1.6690477019584457E-9 > 1.0E-10 with -0.2796655022366367 vs -0.2796655013030866.
> ERROR: Relative error 4.026622598319469E-8 > 1.0E-10 with -0.5810464384666573 vs -0.5810463916735648.
> ERROR: Relative error 1.5925147093041632E-8 > 1.0E-10 with 0.3443370985342769 vs 0.34433710950151497.
> ERROR: Relative error 1.527092464737425E-8 > 1.0E-10 with 0.29907123360418936 vs 0.29907122447000095.
> ERROR: Relative error 1.6981187364236183E-8 > 1.0E-10 with -1.3172908215225392 vs -1.3172908662608644.
> ERROR: Relative error 1.1249863733341583E-9 > 1.0E-10 with 2.372276123264216 vs 2.3722761179266594.
> ERROR: Relative error 2.0761911589432542E-8 > 1.0E-10 with 1.2022032376875469 vs 1.2022031877674733.
> ERROR: Relative error 1.565994385459594E-8 > 1.0E-10 with -3.2028287262088755 vs -3.202828826521113.
> ERROR: Relative error 1.6864676187009944E-8 > 1.0E-10 with -1.1617410552745209 vs -1.1617410160897481.
> ERROR: Relative error 6.7621761573523795E-9 > 1.0E-10 with -2.357698874691257 vs -2.3576989065776073.
> ERROR: Relative error 8.077058791206047E-9 > 1.0E-10 with -0.826672150112067 vs -0.8266721634662262.
> ERROR: Relative error 3.126862662452838E-7 > 1.0E-10 with 0.13533522445012097 vs 0.13533530908507949.
> ERROR: Relative error 1.4938685572516403E-8 > 1.0E-10 with 1.6979405686913527 vs 1.697940619421354.
> ERROR: Relative error 1.5016745260435074E-9 > 1.0E-10 with -1.1265715161920746 vs -1.1265715128085871.
> ERROR: Relative error 1.57421411011751E-8 > 1.0E-10 with -1.4288550242468203 vs -1.4288549792603462.
> ERROR: Relative error 5.967223900831169E-9 > 1.0E-10 with 3.3608783218897167 vs 3.360878361999944.
> ERROR: Relative error 2.6680716508589268E-8 > 1.0E-10 with -1.017766223123622 vs -1.0177662774330876.
> ERROR: Relative error 2.982949129497961E-8 > 1.0E-10 with -0.36147647765070573 vs -0.36147649921602526.
> ERROR: Relative error 3.657840826605735E-8 > 1.0E-10 with 0.45134768685895427 vs 0.4513477198781154.
> ERROR: Relative error 1.4907379969827675E-8 > 1.0E-10 with -1.5573016894495448 vs -1.5573017358801216.
> ERROR: Relative error 1.5786845592838222E-8 > 1.0E-10 with 0.14299706440454166 vs 0.14299706891948688.
> ERROR: Relative error 5.200703379619866E-10 > 1.0E-10 with 1.9947644445978157 vs 1.99476444252298.
> ERROR: Relative error 6.52328960308163E-9 > 1.0E-10 with -0.9699901535166611 vs -0.9699901661717145.
> ERROR: Relative error 1.1700565471480131E-8 > 1.0E-10 with 0.6822547438763443 vs 0.6822547598418771.
> ERROR: Relative error 1.7446876565463836E-10 > 1.0E-10 with 1.0078137191134946 vs 1.0078137194651586.
> ERROR: Relative error 1.1440123355883945E-8 > 1.0E-10 with 0.2932107414338165 vs 0.29321074814255066.
> ERROR: Relative error 3.136300157201714E-8 > 1.0E-10 with -0.12095057939678663 vs -0.1209505869835333.
> ERROR: Relative error 6.045675521257672E-9 > 1.0E-10 with -1.6284907233105383 vs -1.6284907036198855.
> ERROR: Relative error 1.9960135534777884E-7 > 1.0E-10 with 0.05796215971397543 vs 0.05796218285263132.
> ERROR: Relative error 1.4599148103004017E-8 > 1.0E-10 with -1.5695035918656892 vs -1.5695036376925207.
> ERROR: Relative error 6.543527908682189E-9 > 1.0E-10 with -1.0879498840683728 vs -1.0879498983064337.
> ERROR: Relative error 2.863818344889719E-8 > 1.0E-10 with 0.986026344823579 vs 0.9860264012995873.
> ERROR: Relative error 1.1056654568999266E-8 > 1.0E-10 with 0.24707866293721786 vs 0.24707866840094478.
> ERROR: Relative error 6.455137924885195E-8 > 1.0E-10 with -0.40055946263778364 vs -0.4005594109244554.
> ERROR: Relative error 1.639718543077429E-8 > 1.0E-10 with -2.1781136614834082 vs -2.178113590053542.
> ERROR: Relative error 4.8798430636206827E-8 > 1.0E-10 with 0.05076196924826959 vs 0.0507619742024787.
> ERROR: Relative error 9.165908471957055E-9 > 1.0E-10 with 1.8794282546494758 vs 1.8794282201961414.
> ERROR: Relative error 2.978816706402979E-8 > 1.0E-10 with 0.3975604976144726 vs 0.39756052129967034.
> ERROR: Relative error 2.963621786230762E-8 > 1.0E-10 with 0.6415698417229411 vs 0.6415698797503494.
> ERROR: Relative error 2.3109709555514415E-8 > 1.0E-10 with -1.1982869673393794 vs -1.1982870227235083.
> ERROR: Relative error 3.2115462610775645E-10 > 1.0E-10 with -2.0857041042357167 vs -2.0857041055753838.
> ERROR: Relative error 9.948630979193247E-8 > 1.0E-10 with -0.10622623825604993 vs -0.10622625939216493.
> ERROR: Relative error 6.778314445963566E-8 > 1.0E-10 with 0.2196231226007214 vs 0.21962309282723172.
> ERROR: Relative error 2.5809976150965856E-9 > 1.0E-10 with 2.588105950679918 vs 2.5881059640397086.
> ERROR: Relative error 9.087554361084105E-8 > 1.0E-10 with 0.09544286378739902 vs 0.09544284644055634.
> ERROR: Relative error 1.8858614351612048E-8 > 1.0E-10 with -0.5589644431314431 vs -0.5589644220488538.
> ERROR: Relative error 7.736763250891607E-8 > 1.0E-10 with 0.04561424549587588 vs 0.04561425255400879.
> ERROR: Relative error 2.4394587006452004E-8 > 1.0E-10 with 0.8068575646006609 vs 0.806857603966576.
> ERROR: Relative error 1.2147537773849554E-8 > 1.0E-10 with -1.2798666770117564 vs -1.2798666459172992.
> ERROR: Relative error 4.395188501268004E-9 > 1.0E-10 with 0.77094736022815 vs 0.7709473534512321.
> ERROR: Relative error 4.273504230193166E-8 > 1.0E-10 with -0.04529170878696899 vs -0.04529171265805534.
> ERROR: Relative error 5.193072042217285E-9 > 1.0E-10 with 0.26683412238923976 vs 0.26683411961786213.
> ERROR: Relative error 5.044924487623104E-9 > 1.0E-10 with -0.743458732252303 vs -0.7434587247509167.
> ERROR: Relative error 1.7148401756786993E-9 > 1.0E-10 with 0.8849261094606061 vs 0.8849261064255924.
> ERROR: Relative error 5.823286681596099E-10 > 1.0E-10 with -2.926710388969381 vs -2.9267103855607663.
> ERROR: Relative error 4.8325979184277686E-9 > 1.0E-10 with 1.8593052545877329 vs 1.8593052725582824.
> ERROR: Relative error 6.601979216797429E-8 > 1.0E-10 with 0.5235158749686739 vs 0.523515944093497.
> ERROR: Relative error 3.434129104490014E-8 > 1.0E-10 with -1.062474946283946 vs -1.0624750192574712.
> ERROR: Relative error 1.8295696384999767E-8 > 1.0E-10 with -0.2937149434414278 vs -0.29371493269398913.
> ERROR: Relative error 1.0063200936207417E-9 > 1.0E-10 with 1.119269021693839 vs 1.1192690239465248.
> ERROR: Relative error 4.809906625420993E-9 > 1.0E-10 with 1.5742089463824267 vs 1.5742089615260229.
> ERROR: Relative error 9.943604666704722E-9 > 1.0E-10 with 0.8775827136936241 vs 0.8775827311462954.
> ERROR: Relative error 6.3483134114491796E-9 > 1.0E-10 with -0.698009708220039 vs -0.6980096993576703.
> ERROR: Relative error 4.535157418604792E-8 > 1.0E-10 with -0.1749102977141979 vs -0.17491028184928392.
> ERROR: Relative error 2.2360276379519686E-9 > 1.0E-10 with -0.8108452879638077 vs -0.8108452843376628.
> ERROR: Relative error 1.0374420129323381E-8 > 1.0E-10 with -0.8834414065663451 vs -0.8834413882359606.
> ERROR: Relative error 1.1878019489316737E-9 > 1.0E-10 with 2.532993506117655 vs 2.5329935121350444.
> ERROR: Relative error 2.70754303169682E-7 > 1.0E-10 with 0.0717457323097893 vs 0.07174577116073133.
> ERROR: Relative error 9.53853266322461E-10 > 1.0E-10 with -0.6740828968349293 vs -0.6740828955489769.
> ERROR: Relative error 5.7021836168127075E-9 > 1.0E-10 with -1.2861467707448162 vs -1.2861467854125064.
> ERROR: Relative error 7.72738725803068E-9 > 1.0E-10 with 0.6463938706474603 vs 0.6463938806373319.
> ERROR: Relative error 1.6274098482424904E-8 > 1.0E-10 with 0.3306564558987715 vs 0.33065646666104315.
> ERROR: Relative error 3.103682986912765E-8 > 1.0E-10 with -0.6970667742796319 vs -0.697066817549119.
> ERROR: Relative error 1.535686164274709E-8 > 1.0E-10 with 0.701665021104026 vs 0.701664999553281.
> ERROR: Relative error 4.100056878916496E-9 > 1.0E-10 with -2.1103616009334836 vs -2.110361618238689.
> ERROR: Relative error 8.123120975873666E-9 > 1.0E-10 with 1.2604178829093742 vs 1.2604179033864282.
> ERROR: Relative error 1.0941209833039406E-8 > 1.0E-10 with 1.2266289837112507 vs 1.2266290105528612.
> ERROR: Relative error 5.541442492773271E-7 > 1.0E-10 with -0.012777216573308391 vs -0.012777202412474067.
> ERROR: Relative error 4.2253762431563665E-9 > 1.0E-10 with -1.339045610669553 vs -1.339045621985496.
> ERROR: Relative error 8.360566732946797E-8 > 1.0E-10 with 0.1561570755476005 vs 0.1561571016588357.
> ERROR: Relative error 2.767267007771319E-9 > 1.0E-10 with -2.6167992977337984 vs -2.616799312216563.
> ERROR: Relative error 1.7289233542875815E-8 > 1.0E-10 with 1.0579964137565983 vs 1.057996450340493.
> ERROR: Relative error 1.7154762078446898E-8 > 1.0E-10 with -0.5334469910041575 vs -0.5334469727018454.
> ERROR: Relative error 3.110409084496228E-9 > 1.0E-10 with 1.7670386817474821 vs 1.7670386707550558.
> ERROR: Relative error 1.1173482982155575E-8 > 1.0E-10 with -1.1253843105032257 vs -1.125384335652151.
> ERROR: Relative error 9.285717485307979E-9 > 1.0E-10 with -1.1188404636483114 vs -1.1188404428698386.
> ERROR: Relative error 2.1555251644095785E-9 > 1.0E-10 with 1.1254590576532526 vs 1.1254590625051633.
> ERROR: Relative error 2.319234552374325E-8 > 1.0E-10 with 0.9556977590880147 vs 0.955697803417761.
> ERROR: Relative error 4.4763326839679915E-9 > 1.0E-10 with 1.591009326384972 vs 1.5910093406287462.
> ERROR: Relative error 9.487915016929957E-9 > 1.0E-10 with 0.5884221509707697 vs 0.588422139804971.
> ERROR: Relative error 1.7764401531913022E-8 > 1.0E-10 with 0.916673465359319 vs 0.9166734979276306.
> ERROR: Relative error 2.3563316249488676E-8 > 1.0E-10 with -0.13562099666658628 vs -0.13562099027522556.
> ERROR: Relative error 4.394744814951474E-9 > 1.0E-10 with 2.0302516382886835 vs 2.0302516561335593.
> ERROR: Relative error 6.460001556392016E-10 > 1.0E-10 with 1.3949844117223749 vs 1.3949844099200546.
> ERROR: Relative error 8.778931914804406E-9 > 1.0E-10 with -1.2093561879421602 vs -1.2093562091758716.
> ERROR: Relative error 1.3404298361036346E-9 > 1.0E-10 with -1.7649810877732333 vs -1.7649810830415666.
> ERROR: Relative error 8.428228195211393E-9 > 1.0E-10 with 2.497793466520835 vs 2.497793508624782.
> ERROR: Relative error 1.0284037808289523E-8 > 1.0E-10 with 0.8600743385970706 vs 0.8600743209069968.
> Testing the 2D convolution transpose function.
> Testing the cross-entropy loss function with zero-valued predictions.
> Testing the im2col and col2im functions.
> Testing the 2D max pooling functions.
>  - Testing w/ padh=0 & padw=0.
>  - Testing w/ padh=0 & padw=1.
>  - Testing w/ padh=0 & padw=2.
>  - Testing w/ padh=0 & padw=3.
>  - Testing w/ padh=1 & padw=0.
>  - Testing w/ padh=1 & padw=1.
>  - Testing w/ padh=1 & padw=2.
>  - Testing w/ padh=1 & padw=3.
>  - Testing w/ padh=2 & padw=0.
>  - Testing w/ padh=2 & padw=1.
>  - Testing w/ padh=2 & padw=2.
>  - Testing w/ padh=2 & padw=3.
>  - Testing w/ padh=3 & padw=0.
>  - Testing w/ padh=3 & padw=1.
>  - Testing w/ padh=3 & padw=2.
>  - Testing w/ padh=3 & padw=3.
>  - Testing for correct behavior against known answer w/ pad=0.
>  - Testing for correct behavior against known answer w/ pad=1.
>  - Testing for correct behavior against known answer w/ all negative matrix w/ pad=0.
>  - Testing for correct behavior against known answer w/ all negative matrix w/ pad=1.
> Testing the padding and unpadding functions.
> Testing the tanh forward function.
> ---
> Other tests complete -- look for any ERRORs or WARNINGs.
> 17/05/30 17:26:25 INFO api.DMLScript: END DML run 05/30/2017 17:26:25
> SystemML Statistics:
> Total elapsed time:		126.751 sec.
> Total compilation time:		2.136 sec.
> Total execution time:		124.615 sec.
> Number of compiled MR Jobs:	0.
> Number of executed MR Jobs:	0.
> CUDA/CuLibraries init time:	1.086/0.985 sec.
> Number of executed GPU inst:	552273.
> GPU mem tx time  (alloc/dealloc/set0/toDev/fromDev):	0.032/0.002/6.738/29.418/16.843 sec.
> GPU mem tx count (alloc/dealloc/set0/toDev/fromDev/evict):	221/221/972795/532/402390/237544/0.
> GPU conversion time  (sparseConv/sp2dense/dense2sp):	0.001/0.037/0.000 sec.
> GPU conversion count (sparseConv/sp2dense/dense2sp):	532/561/0.
> Cache hits (Mem, WB, FS, HDFS):	2296853/0/0/0.
> Cache writes (WB, FS, HDFS):	23912/0/0.
> Cache times (ACQr/m, RLS, EXP):	18.229/0.952/0.666/0.000 sec.
> HOP DAGs recompiled (PRED, SB):	0/0.
> HOP DAGs recompile time:	0.053 sec.
> Functions recompiled:		6501.
> Functions recompile time:	12.265 sec.
> ParFor loops optimized:		1235.
> ParFor optimize time:		2.541 sec.
> ParFor initialize time:		0.092 sec.
> ParFor result merge time:	0.003 sec.
> ParFor total update in-place:	0/288348/367740
> Total JIT compile time:		39.733 sec.
> Total JVM GC count:		75.
> Total JVM GC time:		0.248 sec.
> LibMatrixDNN dense count (conv/bwdF/bwdD/im2col/maxBwd):	0/0/0/0/0.
> LibMatrixDNN sparse count (conv/bwdF/bwdD/im2col/maxBwd):	0/0/0/0/0.
> LibMatrixDNN conv(im2col/matmult), bwdF (im2col/matmult), bwdD (col2im/matmult) time:	0.000/0.000/0.000/0.000/0.000/0.000 sec.
> Heavy hitter instructions:
>    #  Instruction           Time(s)    Count  GPU
>    1  forward               111.231     7027  
>    2  lstm                   56.371        1  
>    3  gpu_*                  34.008   208803  s2d[0.000s,2], mmck[5.535s,75808], msk[9.984s,132995], ao[1.938s,208803], H2D[15.177s,209727]
>    4  conv2d_simple          19.747        1  
>    5  rnn                    14.719        1  
>    6  max_pool2d             12.343        2  
>    7  gpu_ba+*               12.131    71493  H2D[7.586s,99457], Mdmdm[0.623s,43613], ao[0.721s,71493], Mddot[2.555s,27880]
>    8  leftIndex              10.275   367740  
>    9  conv2d                  7.951        2  
>   10  gpu_+                   7.352    98170  s2d[0.035s,542], msk[0.196s,2787], ddgeaml[0.319s,24089], D2D[0.383s,39732], ao[0.975s,98170], H2D[2.294s,29303], mmck[2.367s,31562]
>   11  gpu_-                   6.977    80792  mmck[0.153s,2067], ddgeaml[0.179s,13471], msk[4.702s,65254], H2D[0.482s,6076], ao[0.896s,80792]
>   12  sigmoid                 5.261    88549  
>   13  backward                4.320       44  
>   14  gpu_uamax               4.054    15628  r[0.002s,15628], az[0.093s,15628], D2H[1.046s,15628], rallk[1.313s,15628], H2D[1.469s,15628]
>   15  max_pool2d_simple       4.010        1  
>   16  rmvar                   3.585  5694332  
>   17  gpu_r'                  3.118    25395  ao[0.255s,25394], ddgeaml[0.303s,25394], H2D[2.411s,25395]
>   18  batch_norm1d            2.906        2  
>   19  rangeReIndex            2.716   507414  
>   20  gpu_uarsqk+             2.296    12662  a[0.000s,1], r[0.003s,12661], az[0.058s,12662], ao[0.139s,12662], rrowk[0.883s,12662], msk[1.052s,12662]
>   21  gpu_uak+                2.164    13551  a[0.000s,1], s2d[0.001s,9], H2D[0.004s,18], r[0.005s,13550], az[0.105s,13551], D2H[0.946s,13551], rallk[0.966s,13551]
>   22  rshape                  2.057   345346  
>   23  affine                  1.425        1  
>   24  batch_norm2d            1.350        2  
>   25  gpu_+*                  1.229     8949  daxpymv[0.000s,2], s2d[0.000s,2], D2D[0.078s,8947], daxpy[0.088s,8947], ao[0.088s,8949], H2D[0.907s,12024]
>   26  createvar               0.922  1945781  
>   27  scale_shift1d           0.773        1  
>   28  rand                    0.759    67235  
>   29  dropout                 0.546        1  
>   30  gpu_uacvar              0.493     1312  r[0.000s,3929], a[0.001s,7], ao[0.014s,1312], az[0.019s,3936], mmck[0.091s,1312], rcolk[0.170s,2624], msk[0.173s,2624]
>   31  gpu_/                   0.304     3325  H2D[0.000s,1], ao[0.037s,3325], msk[0.096s,1252], mmck[0.147s,2073]
>   32  scale_shift2d           0.242        1  
>   33  *                       0.238   978652  
>   34  conv2d_builtin          0.232        1  
>   35  gpu_sqrt                0.216     2626  ao[0.028s,2626], sqrtk[0.168s,2626]
>   36  max_pool2d_builtin      0.204        1  
>   37  gpu_bias_add            0.202     1700  s2d[0.000s,4], ao[0.019s,1700], H2D[0.021s,323], nnrbk[0.126s,1700]
>   38  +                       0.190   753849  
>   39  gpu_uacmean             0.177     1312  ao[0.015s,1312], H2D[0.060s,906], rcolk[0.092s,1312]
>   40  check_rel_grad_error    0.177     6001  
>   41  gpu_bias_multiply       0.175     1553  ao[0.017s,1553], H2D[0.025s,319], nnrbk[0.120s,1553]
>   42  col2im_t259             0.165        2  
>   43  ncol                    0.152   404715  
>   44  gpu_uacmax              0.149     1144  ao[0.012s,1144], rcolk[0.119s,1144]
>   45  -                       0.147   539042  
>   46  cpvar                   0.138   563847  
>   47  append                  0.127    40461  
>   48  gpu_uark+               0.123      920  H2D[0.004s,15], ao[0.014s,920], rrowk[0.091s,920]
>   49  im2col                  0.120        2  
>   50  cross_entropy_loss      0.119        2  
>   51  check_rel_error         0.118    18454  
>   52  gpu_uarvar              0.115      310  r[0.000s,925], a[0.001s,5], ao[0.003s,310], az[0.004s,930], mmck[0.020s,310], msk[0.040s,620], rrowk[0.040s,620]
>   53  conv2d_transpose        0.105        2  
>   54  gpu_^2                  0.104      814  H2D[0.002s,30], ao[0.008s,814], msk[0.086s,814]
>   55  gpu_uarmean             0.095      620  ao[0.006s,620], rrowk[0.041s,620], H2D[0.044s,620]
>   56  col2im                  0.084        1  
>   57  nrow                    0.081   162537  
>   58  gpu_conv2d_bias_add     0.081      278  s2d[0.000s,2], nnc[0.000s,278], nni[0.003s,278], ao[0.003s,278], nncf[0.009s,278], H2D[0.020s,281], nnrbk[0.034s,278]
>   59  tanh                    0.065        2  
>   60  log_loss                0.064        1  
>   61  softmax                 0.062        1  
>   62  gpu_maxpooling          0.052      278  nnc[0.000s,278], nni[0.002s,278], ao[0.004s,278], nnmf[0.004s,278], H2D[0.037s,260]
>   63  im2col_t26284           0.051        3  
>   64  im2col_t26282           0.045        3  
>   65  im2col_t228             0.043        2  
>   66  im2col_t259             0.038        2  
>   67  im2col_t25918           0.037        3  
>   68  ==                      0.036     9393  
>   69  im2col_t25920           0.035        3  
>   70  im2col_t26193           0.035        3  
>   71  castdts                 0.034    55103  
>   72  im2col_t26191           0.034        3  
>   73  relu                    0.033        1  
>   74  im2col_t342             0.032        2  
>   75  im2col_t26102           0.031        3  
>   76  im2col_t26100           0.031        3  
>   77  im2col_t25827           0.030        3  
>   78  im2col_t25829           0.030        3  
>   79  im2col_t26011           0.030        3  
>   80  im2col_t434             0.030        2  
>   81  im2col_t25192           0.029        3  
>   82  im2col_t25554           0.029        3  
>   83  im2col_t457             0.029        2  
>   84  assignvar               0.029   144209  
>   85  im2col_t17008           0.029        2  
>   86  l2_reg                  0.029        1  
>   87  im2col_t595             0.029        2  
>   88  im2col_t25465           0.028        3  
>   89  im2col_t4804            0.028        2  
>   90  im2col_t388             0.028        2  
>   91  im2col_t25556           0.027        3  
>   92  im2col_t24923           0.027        3  
>   93  im2col_t25190           0.027        3  
>   94  l1_reg                  0.027        1  
>   95  im2col_t25738           0.027        3  
>   96  im2col_t25463           0.027        3  
>   97  im2col_t25736           0.026        3  
>   98  im2col_t26009           0.026        3  
>   99  im2col_t365             0.026        2  
>  100  im2col_t411             0.026        2  
> {code}
> Ping [~mwdusenb@us.ibm.com], [~niketanpansare]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message