madlib-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] [madlib] njayaram2 commented on a change in pull request #361: Minibatch Preprocessor DL: Add optional num_classes param.
Date Tue, 02 Apr 2019 21:02:12 GMT
njayaram2 commented on a change in pull request #361: Minibatch Preprocessor DL: Add optional
num_classes param.
URL: https://github.com/apache/madlib/pull/361#discussion_r271494568
 
 

 ##########
 File path: src/ports/postgres/modules/utilities/minibatch_preprocessing.py_in
 ##########
 @@ -363,21 +365,70 @@ class MiniBatchPreProcessorDL(MiniBatchPreProcessor):
 
         self._validate_args()
         self.num_of_buffers = self._get_num_buffers()
-        self.to_one_hot_encode = True
 
 Review comment:
   Our 1-hot encoding follows the standard one-hot encoding convention. In fact, it is different
from `keras.to_categorical`. For example, if there are 3 distinct class values captured in
a list `y=[10, 11, 12]`, then the 1-hot encoded vector created by`keras.to_categorical(y)`
is of size 13 (largest class value + 1). If it is called with `keras.to_categorical(y, num_classes=4)`,
it errors out.
   The 1-hot encoding done in MADlib would create a 1-hot encoded vector of size 4 in both
cases.
   
   I would say keras' 1-hot encoding is actually not the standard way of doing it.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

Mime
View raw message