Github user njayaram2 commented on a diff in the pull request:
https://github.com/apache/madlib/pull/272#discussion_r194151705
 Diff: doc/design/modules/neuralnetwork.tex 
@@ 117,6 +122,26 @@ \subsubsection{Backpropagation}
\[\boxed{\delta_{k}^j = \sum_{t=1}^{n_{k+1}} \left( \delta_{k+1}^t \cdot u_{k}^{jt} \right)
\cdot \phi'(\mathit{net}_{k}^j)}\]
where $k = 1,...,N1$, and $j = 1,...,n_{k}$.
+\paragraph{Momentum updates.}
+Momentum\cite{momentum_ilya}\cite{momentum_cs231n} can help accelerate learning and avoid
local minima when using gradient descent. We also support nesterov's accelarated gradient
due to its look ahead characteristics. \\
+Here we need to introduce two new variables namely velocity and momentum. momentum must
be in the range 0 to 1, where 0 means no momentum.
+The velocity is the same size as the coefficient and is accumulated in the direction
of persistent reduction, which speeds up the optimization. The momentum value is responsible
for damping the velocity and is analogous to the coefficient of friction. \\
+In classical momentum we first correct the velocity, and then update the model with that
velocity, whereas in Nesterov momentum, we first move the model in the direction of momentum*velocity
, then correct the velocity and finally use the updated model to calculate the gradient. The
main difference being that in classical momentum, we compute the gradient before updating
the model whereas in nesterov we first update the model and then compute the gradient from
the updated position.\\
 End diff 
`momentum*velocity ,` > `momentum*velocity,`. The extra space before the comma is
moving the `,` to the next line in the pdf.

