Mailing-List: contact cassandra-user-help@incubator.apache.org; run by ezmlm
Precedence: bulk
Reply-To: cassandra-user@incubator.apache.org
Received-SPF: pass (nike.apache.org: domain of jbellis@gmail.com designates
 74.125.78.150 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=mime-version:in-reply-to:references:from:date:message-id:subject:to
         :content-type:content-transfer-encoding;
        b=fPEwV+UJyNARMJKPI85TzA4esm+aChPBX4dj8ov3XvbGq1MegFsY9QF+FN9+wgJRQo
         JtIqeFM7Q+xaV7hvjZvLiJDH6GanyOWoWqmIp9S9bDUr9+RJ6Zb1r+zLIA7BukbdTunh
         adgwbS462CLNtMeh5a71XBPA/ACbt2t+0/Bvc=
MIME-Version: 1.0
In-Reply-To: <5f7770581002201212l7b0f8e9apc4a1ecf05a498365@mail.gmail.com>
References: <ad2266761002161711t5eb94515y7574acc3e4118a7@mail.gmail.com>
	<ad2266761002161837p5f67ca4ag7b6140046c7e114b@mail.gmail.com>
	<e06563881002162107o3e2935d2w66fabb7fe24d1b0@mail.gmail.com>
	<e06563881002162122t4a99543g79adb4499050b084@mail.gmail.com>
	<ad2266761002171151r443e2bfcy7b1005f05aaff79a@mail.gmail.com>
	<e06563881002181203m5a993cc6o6bc0d921468d9ad9@mail.gmail.com>
	<016e01cab13d$a6f7a790$f4e6f6b0$@com>
 <e06563881002190456k61da71f7kf4e4cf893ebf2aa7@mail.gmail.com>
	<ad2266761002191144r33b86890xbf5e53519756225c@mail.gmail.com>
	<5f7770581002201212l7b0f8e9apc4a1ecf05a498365@mail.gmail.com>
From: Jonathan Ellis <jbellis@gmail.com>
Date: Sat, 20 Feb 2010 15:20:09 -0500
Message-ID: <e06563881002201220x2c453c9fq1749b5f0fedd5b0e@mail.gmail.com>
Subject: Re: Testing row cache feature in trunk: write should put record in
	cache
To: cassandra-user@incubator.apache.org
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

We don't use native java serialization for anything but the on-disk
BitSets in our bloom filters (because those are deserialized once at
startup, so the overhead doesn't matter), btw.

We're talking about adding compression after
https://issues.apache.org/jira/browse/CASSANDRA-674.

On Sat, Feb 20, 2010 at 3:12 PM, Tatu Saloranta <tsaloranta@gmail.com> wrot=
e:
> On Fri, Feb 19, 2010 at 11:44 AM, Weijun Li <weijunli@gmail.com> wrote:
>> I see. How much is the overhead of java serialization? Does it slow down=
 the
>> system a lot? It seems to be a tradeoff between CPU usage and memory.
>
> This should be relatively easy to measure, as a stand-alone thing. Or
> maybe even from profiler stack traces
> =A0If native Java serialization is used, there may be more efficient
> alternatives, depending on data -- default serialization is highly
> inefficient for small object graphs (like individual objects), but ok
> for larger graphs; this because much of class metadata is included,
> result is very self-contained.
> Beyond default serialization, there are more efficient general-purpose
> Java serialization frameworks; like Kryo or fast(est) json-based
> serializers (jackson); see
> [http://code.google.com/p/thrift-protobuf-compare/wiki/Benchmarking]
> for some idea on alternatives.
>
> In fact: one interesting idea would be to further trade some CPU for
> less memory by using fast compression (like LZF). I hope to experiment
> with this idea some time in future. But challenge is that this would
> help most with clustered scheme (compressing more than one distinct
> item), which is much trickier to make work. Compression does ok with
> individual items, but real boost comes from redundancy between similar
> items.
>
> -+ Tatu +-
>