madlib-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anthony Thomas <ahtho...@eng.ucsd.edu>
Subject Re: Multiplying a large sparse matrix by a vector
Date Thu, 04 Jan 2018 04:15:17 GMT
Thanks Frank - the answer to both your questions is "yes"

Best,

Anthony

On Wed, Jan 3, 2018 at 3:13 PM, Frank McQuillan <fmcquillan@pivotal.io>
wrote:

>
> Anthony,
>
> Correct the install check error you are seeing is not related.
>
> Cpl questions:
>
> (1)
> Are you using:
>
> -- Multiply matrix with vector
>   matrix_vec_mult( matrix_in, in_args, vector)
>
> (2)
> Is matrix_in encoded in sparse format like at the top of
> http://madlib.apache.org/docs/latest/group__grp__matrix.html
>
> e.g., like this?
>
> row_id | col_id | value
> --------+--------+-------
>       1 |      1 |     9
>       1 |      5 |     6
>       1 |      6 |     6
>       2 |      1 |     8
>       3 |      1 |     3
>       3 |      2 |     9
>       4 |      7 |     0
>
>
> Frank
>
>
> On Wed, Jan 3, 2018 at 2:52 PM, Anthony Thomas <ahthomas@eng.ucsd.edu>
> wrote:
>
>> Okay - thanks Ivan, and good to know about support for Ubuntu from
>> Greenplum!
>>
>> Best,
>>
>> Anthony
>>
>> On Wed, Jan 3, 2018 at 2:38 PM, Ivan Novick <inovick@pivotal.io> wrote:
>>
>>> Hi Anthony, this does NOT look like a Ubuntu problem, and in fact there
>>> is OSS Greenplum officially on Ubuntu you can see here:
>>> http://greenplum.org/install-greenplum-oss-on-ubuntu/
>>>
>>> Greenplum and PostgreSQL do limit to 1 Gig for each field (row/col
>>> combination) but there are techniques to manage data sets working within
>>> these constraints.  I will let someone else who has more experience then me
>>> working with matrices answer how is the best way to do so in a case like
>>> you have provided.
>>>
>>> Cheers,
>>> Ivan
>>>
>>> On Wed, Jan 3, 2018 at 2:22 PM, Anthony Thomas <ahthomas@eng.ucsd.edu>
>>> wrote:
>>>
>>>> Hi Madlib folks,
>>>>
>>>> I have a large tall and skinny sparse matrix which I'm trying to
>>>> multiply by a dense vector. The matrix is 1.25e8 by 100 with approximately
>>>> 1% nonzero values. This operations always triggers an error from Greenplum:
>>>>
>>>> plpy.SPIError: invalid memory alloc request size 1073741824 (context
>>>> 'accumArrayResult') (mcxt.c:1254) (plpython.c:4957)
>>>> CONTEXT:  Traceback (most recent call last):
>>>>   PL/Python function "matrix_vec_mult", line 24, in <module>
>>>>     matrix_in, in_args, vector)
>>>>   PL/Python function "matrix_vec_mult", line 2044, in matrix_vec_mult
>>>>   PL/Python function "matrix_vec_mult", line 2001, in
>>>> _matrix_vec_mult_dense
>>>> PL/Python function "matrix_vec_mult"
>>>>
>>>> Some Googling suggests this error is caused by a hard limit from
>>>> Postgres which restricts the maximum size of an array to 1GB. If this is
>>>> indeed the cause of the error I'm seeing does anyone have any suggestions
>>>> about how to circumvent this issue? This comes up in other cases as well
>>>> like transposing a tall and skinny matrix. MVM with smaller matrices works
>>>> fine.
>>>>
>>>> Here is relevant version information:
>>>>
>>>> SELECT VERSION();
>>>> PostgreSQL 8.3.23 (Greenplum Database 5.1.0 build dev) on
>>>> x86_64-pc-linux-gnu, compiled by GCC gcc
>>>>  (Ubuntu 5.4.0-6ubuntu1~16.04.5) 5.4.0 20160609 compiled on Dec 21 2017
>>>> 09:09:46
>>>>
>>>> SELECT madlib.version();
>>>> MADlib version: 1.12, git revision: unknown, cmake configuration time:
>>>> Thu Dec 21 18:04:47 UTC 201
>>>> 7, build type: RelWithDebInfo, build system: Linux-4.4.0-103-generic, C
>>>> compiler: gcc 4.9.3, C++ co
>>>> mpiler: g++ 4.9.3
>>>>
>>>> Madlib install-check reported one error in the "convex" module related
>>>> to "loss too high" which seems unrelated to the issue described above. I
>>>> know Ubuntu isn't officially supported by Greenplum so I'd like to be
>>>> confident this issue isn't just the result of using an unsupported OS.
>>>> Please let me know if any other information would be helpful.
>>>>
>>>> Thanks,
>>>>
>>>> Anthony
>>>>
>>>
>>>
>>>
>>> --
>>> Ivan Novick, Product Manager Pivotal Greenplum
>>> inovick@pivotal.io --  (Mobile) 408-230-6491 <(408)%20230-6491>
>>> https://www.youtube.com/GreenplumDatabase
>>>
>>>
>>
>

Mime
View raw message