madlib-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] [madlib] fmcquillan99 commented on pull request #525: DL: Model Hopper Refactor
Date Fri, 20 Nov 2020 18:57:37 GMT

fmcquillan99 commented on pull request #525:
URL: https://github.com/apache/madlib/pull/525#issuecomment-731350733


   (1)
   initial tests for functionality - keras_fit()
   
   ```
   DROP TABLE IF EXISTS cifar_10_model, cifar_10_model_summary;
   SELECT madlib.madlib_keras_fit('cifar_10_train_data_packed_allseg',    -- source table
                                  'cifar_10_model',                -- model output table
                                  'model_arch_library',            -- model arch table
                                   1,                              -- model arch id
                                   $$ loss='categorical_crossentropy', optimizer='rmsprop(lr=0.0001,
decay=1e-6)', metrics=['accuracy']$$,  -- compile_params
                                   $$ batch_size=32, epochs=3 $$,  -- fit_params
                                   3,                              -- num_iterations
                                   NULL,                          -- use GPUs
                                   'cifar_10_test_data_packed_allseg',    -- validation dataset

                                   1                               -- metrics compute frequency

                                 ); 
   ```
   produces warning:
   
   ```
   WARNING:  This version of tensorflow does not support XLA auto-cluster JIT optimization.
 HINT:  upgrading tensorflow may improve performance.  (seg0 slice1 10.128.0.41:40000 pid=6270)
   CONTEXT:  PL/Python function "fit_transition"
   WARNING:  This version of tensorflow does not support XLA auto-cluster JIT optimization.
 HINT:  upgrading tensorflow may improve performance.  (seg1 slice1 10.128.0.41:40001 pid=6271)
   CONTEXT:  PL/Python function "fit_transition"
   ```
   
   What does user need to do to enable XLA?  I am on TF 1.13.1 currently.
   
   Otherwise this ran and also warm start seemed to work.
   
   
   (2)
   initial tests for functionality - keras_fit_multiple_model()
   
   first I started with single segment:
   ```
   SELECT __dist_key__, independent_var_shape, dependent_var_shape, buffer_id FROM cifar_10_train_data_packed
ORDER BY __dist_key__;
    __dist_key__ | independent_var_shape | dependent_var_shape | buffer_id 
   --------------+-----------------------+---------------------+-----------
               1 | {16667,32,32,3}       | {16667,10}          |         0
               1 | {16666,32,32,3}       | {16666,10}          |         2
               1 | {16667,32,32,3}       | {16667,10}          |         1
   (3 rows)
   
   SELECT __dist_key__, independent_var_shape, dependent_var_shape, buffer_id FROM cifar_10_test_data_packed
ORDER BY __dist_key__;
    __dist_key__ | independent_var_shape | dependent_var_shape | buffer_id 
   --------------+-----------------------+---------------------+-----------
               1 | {10000,32,32,3}       | {10000,10}          |         0
   (1 row)
   ```
   
   run multi fit:
   ```
   DROP TABLE IF EXISTS cifar10_multi_model, cifar10_multi_model_summary, cifar10_multi_model_info;
   SELECT madlib.madlib_keras_fit_multiple_model('cifar_10_train_data_packed',    -- source_table
                                                 'cifar10_multi_model',     -- model_output_table
                                                 'mst_table',               -- model_selection_table
                                                  3,                       -- num_iterations
                                                  NULL,                     -- use gpus
                                                 'cifar_10_test_data_packed',      -- validation
dataset
                                                  1,                         -- metrics compute
frequency
                                                  NULL,                      -- warm_start
                                                  'me',
                                                  'this is a test run'
                                                );
   ```
   
   produces error:
   ```
   ERROR:  plpy.Error: madlib_keras_fit_multiple_model error: No GPUs configured on hosts.
(plpython.c:5038)
   CONTEXT:  Traceback (most recent call last):
     PL/Python function "madlib_keras_fit_multiple_model", line 23, in <module>
       fit_obj = madlib_keras_fit_multiple_model.FitMultipleModel(**globals())
     PL/Python function "madlib_keras_fit_multiple_model", line 147, in __init__
     PL/Python function "madlib_keras_fit_multiple_model", line 295, in get_accessible_gpus_for_seg
   PL/Python function "madlib_keras_fit_multiple_model"
   
   ```
   
   so it looks like `use gpus=NULL` is now defaulting to `TRUE` but it should default to `FALSE`
i.e., CPUs.  It used to default to CPUs.
   
   
   (3)
   initial tests for functionality - keras_fit_multiple_model()
   
   next I used 2 segments:
   ```
   SELECT __dist_key__, independent_var_shape, dependent_var_shape, buffer_id FROM cifar_10_train_data_packed_allseg
ORDER BY __dist_key__;
    __dist_key__ | independent_var_shape | dependent_var_shape | buffer_id 
   --------------+-----------------------+---------------------+-----------
               0 | {12500,32,32,3}       | {12500,10}          |         3
               0 | {12500,32,32,3}       | {12500,10}          |         1
               1 | {12500,32,32,3}       | {12500,10}          |         2
               1 | {12500,32,32,3}       | {12500,10}          |         0
   (4 rows)
   
   SELECT __dist_key__, independent_var_shape, dependent_var_shape, buffer_id FROM cifar_10_test_data_packed_allseg
ORDER BY __dist_key__;
    __dist_key__ | independent_var_shape | dependent_var_shape | buffer_id 
   --------------+-----------------------+---------------------+-----------
               0 | {5000,32,32,3}        | {5000,10}           |         1
               1 | {5000,32,32,3}        | {5000,10}           |         0
   (2 rows)
   ```
   
   run multi fit:
   ```
   DROP TABLE IF EXISTS cifar10_multi_model, cifar10_multi_model_summary, cifar10_multi_model_info;
   SELECT madlib.madlib_keras_fit_multiple_model('cifar_10_train_data_packed_allseg',    --
source_table
                                                 'cifar10_multi_model',     -- model_output_table
                                                 'mst_table',               -- model_selection_table
                                                  3,                       -- num_iterations
                                                  NULL,                     -- use gpus
                                                 'cifar_10_test_data_packed_allseg',     
-- validation dataset
                                                  1,                         -- metrics compute
frequency
                                                  NULL,                      -- warm_start
                                                  'me',
                                                  'this is a test run'
                                                );
   ```
   
   which produced error:
   ```
   ERROR:  plpy.SPIError: PRIMARY KEY and DISTRIBUTED BY definitions incompatible
   HINT:  When there is both a PRIMARY KEY, and a DISTRIBUTED BY clause, the DISTRIBUTED BY
clause must be equal to or a left-subset of the PRIMARY KEY
   CONTEXT:  Traceback (most recent call last):
     PL/Python function "madlib_keras_fit_multiple_model", line 24, in <module>
       fit_obj.fit_multiple_model()
     PL/Python function "madlib_keras_fit_multiple_model", line 241, in fit_multiple_model
     PL/Python function "madlib_keras_fit_multiple_model", line 509, in init_model_output_tbl
   PL/Python function "madlib_keras_fit_multiple_model"
   ```
   
   I have also seen this error when running on a single segment.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



Mime
View raw message