madlib-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] [madlib] reductionista opened a new pull request #521: utilities/debug.py_in
Date Tue, 06 Oct 2020 01:43:42 GMT

reductionista opened a new pull request #521:
URL: https://github.com/apache/madlib/pull/521


   There are no function functional changes in this PR, this is just for making MADlib development
and performance profiling easier.
   
   1. Adds a new collection of utility functions (`utilities/debug.py_in`), to aid with the
debugging process.
   
   The main purpose of this new debug module is to make it easy to
       turn on or off certain kinds of debug messages, without needing to make lots of temporary
changes to the code.
       There is a frequent debugging pattern which has kept repeating when it comes to performance
issues:
          - We find a bug, for example something is running slowly or filling up the disk
or memory.  But we don't know where in a long sequence of SQL queriers it's happening.
          - We add lots of different extra lines of code to print out timing information after
each query is sent, as well as what the queries look like after all python formatting params
have been filled in, what their EXPLAIN plans look like, and sometimes other things... like
which models are on which segments after each hop (for the case of `madlib_keras_fit_mutliple`).
          - We check these changes in on a temporary branch, and spend a while investigating,
eventually getting a good picture of where the performance bottleneck is.  As we go along,
often the types of information getting printed get more complicated and spread out through
the code, and these commits get all mixed up with simultaneously commits we're making to actually
fix the bug.
           - After a series of tweeks to different stages of the UDF implementation, the performance
profiling looks good again, and we declare the bug fixed.
           - Then we sift through a complicated mess of commits, stripping out any debugging
changes so they don't get merged in to the production code.
           - Soon after, we find another bug, and now we've got to start putting some debugging
statements back in, little by little.  Eventually the whole cycle repeats, and we've done
and undone a lot of work many times at this point.
   
   To improve the situation, we can build any of the profiling and debugging statements right
into the production code, in a way where it can be easily enabled or disabled simply by changing
a single line of code.  (In the future, we could even make it a compiler option.)
   
   Hopefully this will speed up the process of tracking down performance bottlenecks and reduce
the amount of redundant work repeated.
   
      debug.start_timing(msg)
         Starts a timer.  msg is any name or short description of the timer which can be grepped
for later.  Multiple timers can be running simultaneously, and they won't interfere with each
other as long as they are each named differently.  This will print a header with column names
that can be matched up later with the actual timings as they get printed.
   
       debug.print_timing(msg, force=False) :
          Used to print timing information for different stages of a python UDF (for example,
madlib_keras_fit_multiple()).
          This will print (or not print) how many seconds have elapsed since this particular
timer (associate with msg param) was started. If enabled, it will print a nicely formatted
banner with information in different columns that can be easily extracted from logs with the
grep and cut commands.  For example, we could use it to break down the current run_training()
function into _hop_time_, _uda_time_, _truncate_time_, _copy_time_, and _delete_time_.  I'm
not including this example yet because it's on a branch with a major refactor and these particular
stages no longer line up.  (Refactor will be raised as a separate PR and call these functions
with names appropriate for the new stages.)
   
         Enable all timing messages in a source file (just after debug module has been imported):
            debug.timings_enabled = True
   
         Enable a single call to debug.print_timings()
            by passing force=True.  Will print even if
            debug.timings_enabled is False
   
       debug.print_mst_keys(force=False) :
           If enabled, prints a map between mst_keys and seg_id's as the mst_keys
           hop around from segment to segment... to verify correct
           MOP behavior.
   
       debug.plpy.execute(query, ..., force=False)
   
           If enabled, prints query, EXPLAIN ANALYZE, and timing of a query
           while executing it.  Otherwise, just runs the query with plpy.execute(query, ...)
   
       debug.plpy.info(msg,..., force=False):
          Calls plpy.info(msg, ...) if enabled.
          Otherwise, does nothing.
   
       debug.plpy.debug(msg, ..., force=False):
           Behaves like plpy.debug(msg, ...) if disabled (printing only
           if DEBUG level is set high enough), but becomes a
           plpy.info() if enabled.
   
   2. One minor fix to silence a warning message that started showing up for me recently (I
think when I upgraded from pg9 to pg12).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



Mime
View raw message