From user-return-488-archive-asf-public=cust-asf.ponee.io@madlib.apache.org Wed Jan 3 23:38:17 2018 Return-Path: X-Original-To: archive-asf-public@eu.ponee.io Delivered-To: archive-asf-public@eu.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by mx-eu-01.ponee.io (Postfix) with ESMTP id 5AD3718077A for ; Wed, 3 Jan 2018 23:38:17 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 4AA59160C1B; Wed, 3 Jan 2018 22:38:17 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 68578160C05 for ; Wed, 3 Jan 2018 23:38:16 +0100 (CET) Received: (qmail 86699 invoked by uid 500); 3 Jan 2018 22:38:15 -0000 Mailing-List: contact user-help@madlib.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@madlib.apache.org Delivered-To: mailing list user@madlib.apache.org Received: (qmail 86689 invoked by uid 99); 3 Jan 2018 22:38:15 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 03 Jan 2018 22:38:15 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 2445F1A0290 for ; Wed, 3 Jan 2018 22:38:15 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.999 X-Spam-Level: * X-Spam-Status: No, score=1.999 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=pivotal-io.20150623.gappssmtp.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id QF6iuOoQUNXk for ; Wed, 3 Jan 2018 22:38:12 +0000 (UTC) Received: from mail-io0-f179.google.com (mail-io0-f179.google.com [209.85.223.179]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id CA5635FAC9 for ; Wed, 3 Jan 2018 22:38:11 +0000 (UTC) Received: by mail-io0-f179.google.com with SMTP id e20so238577iof.12 for ; Wed, 03 Jan 2018 14:38:11 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=pivotal-io.20150623.gappssmtp.com; s=20150623; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=nLxb7Ov29Uy65vtHWqByEhySjxnwwrwyE18dtpW4Qik=; b=c9dP+UuvjZt3MJJWxvAK8UGqhceDD+tqdzQd2FxMn6pe5YTGM0f3qnsD2/KF4FfVfW T5JhDkng1IbHwswvBHJtEZ6OxXphXAhZd31gqxTyJLVCsCsba75gCfdZbZ1lR4EoFtfx 8SyojwEyfVnfRAsAQjHm/DuoX3KHRrR0FmK5jyQik5foRME+Hpv3E+yb9a0T5fsVao9h c6TFQMMRF2HZme6Iz6kz/sm2sj5X/ZFxZXvo5lAgdEY7WqNj0jCiDZsujSTFIOq1NRop gJrUkuZypBpCCq3nVfk6yRsjyKFezaMUh0fvvVWZvWauENP+iG//i8MYbtBc866QkW8Z nXkQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=nLxb7Ov29Uy65vtHWqByEhySjxnwwrwyE18dtpW4Qik=; b=h+ZjJGgOcom2uC81t1pjZT158x7pchF2fnNlQsmFCc5MqQMHMU0ZJ+Qx44YnjUrhxt gW8fJS5JOK12m42jC2pntCSR+0AgvpeOEZ+gIxXA2wp5xOalPzlqO4zT0lpVxD+LSLn9 X+SZalHNWPt9SIwIyGMTsGEsm6TcK0eTczXoo6n60zurS7p8BD/aw6XPpnqq42mk5Wrg 3zSTlxauu3alDMEa8YxkCxge3Kro89wf7kVDUWmUyWzUl6i50kcW1C+kH77IB/Kftjwv 4A3qXpU+qQ1jaOpO92gbIDt+7kDNKkxjn6lT8QRMnRj36QlOfKav09scW7SCkk0r/hOv 6AfA== X-Gm-Message-State: AKGB3mIwDKkH+YpeCaCxjKQjEwvC8RFSXwDnSWJka0/u6gvRh36cl2ac cb2bqqla0jZb9jfUk4KtveQuNNSl5vO5sW/L4x/T2Bzi X-Google-Smtp-Source: ACJfBot39xdHhWs1TINJgRot3rNkhQee4yVP4H6k3OXBNs4bX/6LzvDa/wFXIX3l6dm9LIsQiv6OfTS5ydwDifsQ35Q= X-Received: by 10.107.195.133 with SMTP id t127mr3374182iof.21.1515019084931; Wed, 03 Jan 2018 14:38:04 -0800 (PST) MIME-Version: 1.0 Received: by 10.2.151.233 with HTTP; Wed, 3 Jan 2018 14:38:04 -0800 (PST) In-Reply-To: References: From: Ivan Novick Date: Wed, 3 Jan 2018 14:38:04 -0800 Message-ID: Subject: Re: Multiplying a large sparse matrix by a vector To: user@madlib.apache.org Content-Type: multipart/alternative; boundary="94eb2c1886e262b8990561e6deb7" --94eb2c1886e262b8990561e6deb7 Content-Type: text/plain; charset="UTF-8" Hi Anthony, this does NOT look like a Ubuntu problem, and in fact there is OSS Greenplum officially on Ubuntu you can see here: http://greenplum.org/install-greenplum-oss-on-ubuntu/ Greenplum and PostgreSQL do limit to 1 Gig for each field (row/col combination) but there are techniques to manage data sets working within these constraints. I will let someone else who has more experience then me working with matrices answer how is the best way to do so in a case like you have provided. Cheers, Ivan On Wed, Jan 3, 2018 at 2:22 PM, Anthony Thomas wrote: > Hi Madlib folks, > > I have a large tall and skinny sparse matrix which I'm trying to multiply > by a dense vector. The matrix is 1.25e8 by 100 with approximately 1% > nonzero values. This operations always triggers an error from Greenplum: > > plpy.SPIError: invalid memory alloc request size 1073741824 (context > 'accumArrayResult') (mcxt.c:1254) (plpython.c:4957) > CONTEXT: Traceback (most recent call last): > PL/Python function "matrix_vec_mult", line 24, in > matrix_in, in_args, vector) > PL/Python function "matrix_vec_mult", line 2044, in matrix_vec_mult > PL/Python function "matrix_vec_mult", line 2001, in > _matrix_vec_mult_dense > PL/Python function "matrix_vec_mult" > > Some Googling suggests this error is caused by a hard limit from Postgres > which restricts the maximum size of an array to 1GB. If this is indeed the > cause of the error I'm seeing does anyone have any suggestions about how to > circumvent this issue? This comes up in other cases as well like > transposing a tall and skinny matrix. MVM with smaller matrices works fine. > > Here is relevant version information: > > SELECT VERSION(); > PostgreSQL 8.3.23 (Greenplum Database 5.1.0 build dev) on > x86_64-pc-linux-gnu, compiled by GCC gcc > (Ubuntu 5.4.0-6ubuntu1~16.04.5) 5.4.0 20160609 compiled on Dec 21 2017 > 09:09:46 > > SELECT madlib.version(); > MADlib version: 1.12, git revision: unknown, cmake configuration time: Thu > Dec 21 18:04:47 UTC 201 > 7, build type: RelWithDebInfo, build system: Linux-4.4.0-103-generic, C > compiler: gcc 4.9.3, C++ co > mpiler: g++ 4.9.3 > > Madlib install-check reported one error in the "convex" module related to > "loss too high" which seems unrelated to the issue described above. I know > Ubuntu isn't officially supported by Greenplum so I'd like to be confident > this issue isn't just the result of using an unsupported OS. Please let me > know if any other information would be helpful. > > Thanks, > > Anthony > -- Ivan Novick, Product Manager Pivotal Greenplum inovick@pivotal.io -- (Mobile) 408-230-6491 https://www.youtube.com/GreenplumDatabase --94eb2c1886e262b8990561e6deb7 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi Anthony, this does NOT look like a Ubunt= u problem, and in fact there is OSS Greenplum officially on Ubuntu you can = see here:
http://greenplum.org/install-greenplum-oss-on-ubuntu/

= Greenplum and PostgreSQL do limit to 1 Gig for each field (row/col combinat= ion) but there are techniques to manage data sets working within these cons= traints.=C2=A0 I will let someone else who has more experience then me work= ing with matrices answer how is the best way to do so in a case like you ha= ve provided.

Cheers,
Ivan

On Wed, Jan 3, 2018 at 2:22 PM, Ant= hony Thomas <ahthomas@eng.ucsd.edu> wrote:
Hi Madlib folks,<= br>
I have a large tall and skinny sparse matrix which I'm try= ing to multiply by a dense vector. The matrix is 1.25e8 by 100 with approxi= mately 1% nonzero values. This operations always triggers an error from Gre= enplum:

plpy.SPIError: invalid memory alloc reques= t size 1073741824 (context 'accumArrayResult') (mcxt.c:1254) (plpyt= hon.c:4957)
CONTEXT:=C2=A0 Traceback (most recent call last):
=C2=A0 = PL/Python function "matrix_vec_mult", line 24, in <module><= br>=C2=A0=C2=A0=C2=A0 matrix_in, in_args, vector)
=C2=A0 PL/Python funct= ion "matrix_vec_mult", line 2044, in matrix_vec_mult
=C2=A0 PL= /Python function "matrix_vec_mult", line 2001, in _matrix_vec_mul= t_dense
PL/Python function "matrix_vec_mult"

Some Googling suggests this error is caused by a hard limit from Postgre= s which restricts the maximum size of an array to 1GB. If this is indeed th= e cause of the error I'm seeing does anyone have any suggestions about = how to circumvent this issue? This comes up in other cases as well like tra= nsposing a tall and skinny matrix. MVM with smaller matrices works fine.

Here is relevant version information:

=
SELECT VERSION();
PostgreSQL 8.3.23 (Greenplum Databas= e 5.1.0 build dev) on x86_64-pc-linux-gnu, compiled by GCC gcc
=C2=A0(Ub= untu 5.4.0-6ubuntu1~16.04.5) 5.4.0 20160609 compiled on Dec 21 2017 09:09:4= 6

SELECT madlib.version();
MADlib versio= n: 1.12, git revision: unknown, cmake configuration time: Thu Dec 21 18:04:= 47 UTC 201
7, build type: RelWithDebInfo, build system: Linux-4.4.0-103-= generic, C compiler: gcc 4.9.3, C++ co
mpiler: g++ 4.9.3
<= br>
Madlib install-check reported one error in the "convex&q= uot; module related to "loss too high" which seems unrelated to t= he issue described above. I know Ubuntu isn't officially supported by G= reenplum so I'd like to be confident this issue isn't just the resu= lt of using an unsupported OS. Please let me know if any other information = would be helpful.

Thanks,

Anthony



--
=
Ivan Novick, Product Manager Pivotal Greenplum
inovick@pivotal.io -- =C2=A0(Mobile) 408-230-6491
=

--94eb2c1886e262b8990561e6deb7--