Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (nike.apache.org: domain of doanduyhai@gmail.com designates
 209.85.218.51 as permitted sender)
MIME-Version: 1.0
In-Reply-To: <045D8FD556C73347A47F956EE65F822018574E27@S11MAILD013N2.sh11.lan>
References: <045D8FD556C73347A47F956EE65F822018574E27@S11MAILD013N2.sh11.lan>
Date: Fri, 9 Jan 2015 09:50:39 +0100
Message-ID: 
 <CABNXB2BHHcDY65k64LV=vX29XjDNt9DJtdL5PpxJhp723o0waw@mail.gmail.com>
Subject: Re: C* throws OOM error despite use of automatic paging
From: DuyHai Doan <doanduyhai@gmail.com>
To: user@cassandra.apache.org
Content-Type: multipart/alternative; boundary=089e0158ab3c455ff7050c3440f7

--089e0158ab3c455ff7050c3440f7
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

What is the data size of the column family you're trying to fetch with
paging ? Are you storing big blob or just primitive values ?

On Fri, Jan 9, 2015 at 8:33 AM, Mohammed Guller <mohammed@glassbeam.com>
wrote:

>  Hi =E2=80=93
>
>
>
> We have an ETL application that reads all rows from Cassandra (2.1.2),
> filters them and stores a small subset in an RDBMS. Our application is
> using Datastax=E2=80=99s Java driver (2.1.4) to fetch data from the C* no=
des. Since
> the Java driver supports automatic paging, I was under the impression tha=
t
> SELECT queries should not cause an OOM error on the C* nodes. However, ev=
en
> with just 16GB data on each nodes, the C* nodes start throwing OOM error =
as
> soon as the application starts iterating through the rows of a table.
>
>
>
> The application code looks something like this:
>
>
>
> Statement stmt =3D new SimpleStatement("SELECT x,y,z FROM
> cf").setFetchSize(5000);
>
> ResultSet rs =3D session.execute(stmt);
>
> while (!rs.isExhausted()){
>
>       row =3D rs.one()
>
>       process(row)
>
> }
>
>
>
> Even after we reduced the page size to 1000, the C* nodes still crash. C*
> is running on M3.xlarge machines (4-cores, 15GB). We manually increased t=
he
> heap size to 8GB just to see how much heap C* consumes. With 10-15 minute=
s,
> the heap usage climbs up to 7.6GB. That does not make sense. Either
> automatic paging is not working or we are missing something.
>
>
>
> Does anybody have insights as to what could be happening? Thanks.
>
>
>
> Mohammed
>
>
>
>
>

--089e0158ab3c455ff7050c3440f7
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">What is the data size of the column family you&#39;re tryi=
ng to fetch with paging ? Are you storing big blob or just primitive values=
 ?</div><div class=3D"gmail_extra"><br><div class=3D"gmail_quote">On Fri, J=
an 9, 2015 at 8:33 AM, Mohammed Guller <span dir=3D"ltr">&lt;<a href=3D"mai=
lto:mohammed@glassbeam.com" target=3D"_blank">mohammed@glassbeam.com</a>&gt=
;</span> wrote:<br><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 =
.8ex;border-left:1px #ccc solid;padding-left:1ex">


<div lang=3D"EN-US" link=3D"blue" vlink=3D"purple">
<div>
<p class=3D"MsoNormal">Hi =E2=80=93<u></u><u></u></p>
<p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p>
<p class=3D"MsoNormal">We have an ETL application that reads all rows from =
Cassandra (2.1.2), filters them and stores a small subset in an RDBMS. Our =
application is using Datastax=E2=80=99s Java driver (2.1.4) to fetch data f=
rom the C* nodes. Since the Java driver supports
 automatic paging, I was under the impression that SELECT queries should no=
t cause an OOM error on the C* nodes. However, even with just 16GB data on =
each nodes, the C* nodes start throwing OOM error as soon as the applicatio=
n starts iterating through the rows
 of a table.<u></u><u></u></p>
<p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p>
<p class=3D"MsoNormal">The application code looks something like this:<u></=
u><u></u></p>
<p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p>
<p class=3D"MsoNormal">Statement stmt =3D new SimpleStatement(&quot;SELECT =
x,y,z FROM cf&quot;).setFetchSize(5000);
<u></u><u></u></p>
<p class=3D"MsoNormal">ResultSet rs =3D session.execute(stmt);<u></u><u></u=
></p>
<p class=3D"MsoNormal">while (!rs.isExhausted()){<u></u><u></u></p>
<p class=3D"MsoNormal">=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 row =3D rs.one()<u></=
u><u></u></p>
<p class=3D"MsoNormal">=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 process(row)<u></u><u=
></u></p>
<p class=3D"MsoNormal">} <u></u><u></u></p>
<p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p>
<p class=3D"MsoNormal">Even after we reduced the page size to 1000, the C* =
nodes still crash. C* is running on M3.xlarge machines (4-cores, 15GB). We =
manually increased the heap size to 8GB just to see how much heap C* consum=
es. With 10-15 minutes, the heap usage
 climbs up to 7.6GB. That does not make sense. Either automatic paging is n=
ot working or we are missing something.
<u></u><u></u></p>
<p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p>
<p class=3D"MsoNormal">Does anybody have insights as to what could be happe=
ning? Thanks.<span class=3D"HOEnZb"><font color=3D"#888888"><u></u><u></u><=
/font></span></p><span class=3D"HOEnZb"><font color=3D"#888888">
<p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p>
<p class=3D"MsoNormal">Mohammed <u></u><u></u></p>
<p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p>
<p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p>
</font></span></div>
</div>

</blockquote></div><br></div>

--089e0158ab3c455ff7050c3440f7--