Hi Anthony, this does NOT look like a Ubuntu problem, and in fact there is OSS Greenplum officially on Ubuntu you can see here:

Greenplum and PostgreSQL do limit to 1 Gig for each field (row/col combination) but there are techniques to manage data sets working within these constraints.  I will let someone else who has more experience then me working with matrices answer how is the best way to do so in a case like you have provided.


On Wed, Jan 3, 2018 at 2:22 PM, Anthony Thomas <ahthomas@eng.ucsd.edu> wrote:
Hi Madlib folks,

I have a large tall and skinny sparse matrix which I'm trying to multiply by a dense vector. The matrix is 1.25e8 by 100 with approximately 1% nonzero values. This operations always triggers an error from Greenplum:

plpy.SPIError: invalid memory alloc request size 1073741824 (context 'accumArrayResult') (mcxt.c:1254) (plpython.c:4957)
CONTEXT:  Traceback (most recent call last):
  PL/Python function "matrix_vec_mult", line 24, in <module>
    matrix_in, in_args, vector)
  PL/Python function "matrix_vec_mult", line 2044, in matrix_vec_mult
  PL/Python function "matrix_vec_mult", line 2001, in _matrix_vec_mult_dense
PL/Python function "matrix_vec_mult"

Some Googling suggests this error is caused by a hard limit from Postgres which restricts the maximum size of an array to 1GB. If this is indeed the cause of the error I'm seeing does anyone have any suggestions about how to circumvent this issue? This comes up in other cases as well like transposing a tall and skinny matrix. MVM with smaller matrices works fine.

Here is relevant version information:

PostgreSQL 8.3.23 (Greenplum Database 5.1.0 build dev) on x86_64-pc-linux-gnu, compiled by GCC gcc
 (Ubuntu 5.4.0-6ubuntu1~16.04.5) 5.4.0 20160609 compiled on Dec 21 2017 09:09:46

SELECT madlib.version();
MADlib version: 1.12, git revision: unknown, cmake configuration time: Thu Dec 21 18:04:47 UTC 201
7, build type: RelWithDebInfo, build system: Linux-4.4.0-103-generic, C compiler: gcc 4.9.3, C++ co
mpiler: g++ 4.9.3

Madlib install-check reported one error in the "convex" module related to "loss too high" which seems unrelated to the issue described above. I know Ubuntu isn't officially supported by Greenplum so I'd like to be confident this issue isn't just the result of using an unsupported OS. Please let me know if any other information would be helpful.



Ivan Novick, Product Manager Pivotal Greenplum
inovick@pivotal.io --  (Mobile) 408-230-6491