madlib-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Frank McQuillan <fmcquil...@pivotal.io>
Subject Re: Multiplying a large sparse matrix by a vector
Date Wed, 03 Jan 2018 23:13:00 GMT
Anthony,

Correct the install check error you are seeing is not related.

Cpl questions:

(1)
Are you using:

-- Multiply matrix with vector
  matrix_vec_mult( matrix_in, in_args, vector)

(2)
Is matrix_in encoded in sparse format like at the top of
http://madlib.apache.org/docs/latest/group__grp__matrix.html

e.g., like this?

row_id | col_id | value
--------+--------+-------
      1 |      1 |     9
      1 |      5 |     6
      1 |      6 |     6
      2 |      1 |     8
      3 |      1 |     3
      3 |      2 |     9
      4 |      7 |     0


Frank


On Wed, Jan 3, 2018 at 2:52 PM, Anthony Thomas <ahthomas@eng.ucsd.edu>
wrote:

> Okay - thanks Ivan, and good to know about support for Ubuntu from
> Greenplum!
>
> Best,
>
> Anthony
>
> On Wed, Jan 3, 2018 at 2:38 PM, Ivan Novick <inovick@pivotal.io> wrote:
>
>> Hi Anthony, this does NOT look like a Ubuntu problem, and in fact there
>> is OSS Greenplum officially on Ubuntu you can see here:
>> http://greenplum.org/install-greenplum-oss-on-ubuntu/
>>
>> Greenplum and PostgreSQL do limit to 1 Gig for each field (row/col
>> combination) but there are techniques to manage data sets working within
>> these constraints.  I will let someone else who has more experience then me
>> working with matrices answer how is the best way to do so in a case like
>> you have provided.
>>
>> Cheers,
>> Ivan
>>
>> On Wed, Jan 3, 2018 at 2:22 PM, Anthony Thomas <ahthomas@eng.ucsd.edu>
>> wrote:
>>
>>> Hi Madlib folks,
>>>
>>> I have a large tall and skinny sparse matrix which I'm trying to
>>> multiply by a dense vector. The matrix is 1.25e8 by 100 with approximately
>>> 1% nonzero values. This operations always triggers an error from Greenplum:
>>>
>>> plpy.SPIError: invalid memory alloc request size 1073741824 (context
>>> 'accumArrayResult') (mcxt.c:1254) (plpython.c:4957)
>>> CONTEXT:  Traceback (most recent call last):
>>>   PL/Python function "matrix_vec_mult", line 24, in <module>
>>>     matrix_in, in_args, vector)
>>>   PL/Python function "matrix_vec_mult", line 2044, in matrix_vec_mult
>>>   PL/Python function "matrix_vec_mult", line 2001, in
>>> _matrix_vec_mult_dense
>>> PL/Python function "matrix_vec_mult"
>>>
>>> Some Googling suggests this error is caused by a hard limit from
>>> Postgres which restricts the maximum size of an array to 1GB. If this is
>>> indeed the cause of the error I'm seeing does anyone have any suggestions
>>> about how to circumvent this issue? This comes up in other cases as well
>>> like transposing a tall and skinny matrix. MVM with smaller matrices works
>>> fine.
>>>
>>> Here is relevant version information:
>>>
>>> SELECT VERSION();
>>> PostgreSQL 8.3.23 (Greenplum Database 5.1.0 build dev) on
>>> x86_64-pc-linux-gnu, compiled by GCC gcc
>>>  (Ubuntu 5.4.0-6ubuntu1~16.04.5) 5.4.0 20160609 compiled on Dec 21 2017
>>> 09:09:46
>>>
>>> SELECT madlib.version();
>>> MADlib version: 1.12, git revision: unknown, cmake configuration time:
>>> Thu Dec 21 18:04:47 UTC 201
>>> 7, build type: RelWithDebInfo, build system: Linux-4.4.0-103-generic, C
>>> compiler: gcc 4.9.3, C++ co
>>> mpiler: g++ 4.9.3
>>>
>>> Madlib install-check reported one error in the "convex" module related
>>> to "loss too high" which seems unrelated to the issue described above. I
>>> know Ubuntu isn't officially supported by Greenplum so I'd like to be
>>> confident this issue isn't just the result of using an unsupported OS.
>>> Please let me know if any other information would be helpful.
>>>
>>> Thanks,
>>>
>>> Anthony
>>>
>>
>>
>>
>> --
>> Ivan Novick, Product Manager Pivotal Greenplum
>> inovick@pivotal.io --  (Mobile) 408-230-6491 <(408)%20230-6491>
>> https://www.youtube.com/GreenplumDatabase
>>
>>
>

Mime
View raw message