madlib-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <>
Subject [GitHub] [madlib] dadanielniel opened a new pull request #523: Prevent an "integer out of range" exception in linear regression train
Date Tue, 27 Oct 2020 10:10:42 GMT

dadanielniel opened a new pull request #523:

   ### Module name: Linear-Regression
   ### JIRA: MADlib-1460
   ### Description:
   Linear regression training results in 2 output tables (**neither are optional**): 
   The **primary** output table, that includes the computed coefficients.
   A **summary** output table, that contains a single line.
   #### Scenario
   Running the linear regression training in postgresql on an input table which has **more
than 2^31 records** within it (even if a grouping column is specified), fails due to an
"**integer out of range**" exception.
   #### Source
   **The summary table** has a column that stores **the total number of records** involved
in the computation. The column's data type is a **singed integer**. However, the total number
of records is computed as a **BIGINT**. Therefore, when the total number of records in the
input table is beyond the range of a signed integer (i.e., 2^31), an "integer out of range"
exception is thrown.
   ### Solution
   A simple solution is to change the data type of the column from a **signed integer** into
a **BIGINT**. 
   ### Test
   We have executed the linear regression training function with and without the suggested
modification on an input table having between 2^31-2^32 records. Without the modification,
an integer out of range exception was thrown. After modifying the code as suggested, it worked

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:

View raw message