Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 79D2A18C66 for ; Tue, 19 Jan 2016 17:44:01 +0000 (UTC) Received: (qmail 68969 invoked by uid 500); 19 Jan 2016 17:43:59 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 68930 invoked by uid 500); 19 Jan 2016 17:43:59 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 68920 invoked by uid 99); 19 Jan 2016 17:43:59 -0000 Received: from Unknown (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 19 Jan 2016 17:43:59 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id CDC52C32DD for ; Tue, 19 Jan 2016 17:43:58 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.9 X-Spam-Level: ** X-Spam-Status: No, score=2.9 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=3, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (1024-bit key) header.d=wikimedia.org Received: from mx1-eu-west.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id Xd6bipmC1hZ7 for ; Tue, 19 Jan 2016 17:43:48 +0000 (UTC) Received: from mail-lb0-f181.google.com (mail-lb0-f181.google.com [209.85.217.181]) by mx1-eu-west.apache.org (ASF Mail Server at mx1-eu-west.apache.org) with ESMTPS id 7F59220CB8 for ; Tue, 19 Jan 2016 17:43:47 +0000 (UTC) Received: by mail-lb0-f181.google.com with SMTP id bc4so366927686lbc.2 for ; Tue, 19 Jan 2016 09:43:47 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=wikimedia.org; s=google; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=PCfJ2GC0Gyh30nUsW9WLWdvcm6aKIsi0Ka1EGj2P4wY=; b=XDmjfpSy329kBa9G1xisqu5MB7R30xJQHoStopNo2FLgCoFu/avcwoPpLAFq/Ekrf6 7Uc3LpHjMpYK7sR6q2bSt/t9jBQoEnctFjJqtknh/LwjQgmIG/El91QBP/k5NJvLQzo1 RW21kWfpHpEjE9v+L4DyDF1Um2tpHGP5rQylI= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:content-type; bh=PCfJ2GC0Gyh30nUsW9WLWdvcm6aKIsi0Ka1EGj2P4wY=; b=L4f5iyRxac0eaWh+5MWpG/TBp/XrOf23tXgD2KYfeBaIF13CLahhGvpiLVyFQ855q7 Va3cxPPEtoZMsYa7oQNOWuqKoe0XbNIxZM5nVEYWpGqQWZJuRSD4cQRao1s4C8LNtH5e hocadBu3nrBPIG01RHLUV3IMT/nCjsol/9dXN2PBEbr3rdzxTTn9dQi0BynT0R4S0+mC pIycl7yeAVr4iMAuTfNBYOEY3c0DKOdTCwluPGmKvMEy4H+M7wymwXut5ZwxZ33KXQDC W4/J2/+cmLpRHrJSoFAMvMa8I5yM64o1gHBWcwAiuWpTQPNaxd51HxT1KZNterIqYeP/ KPzw== X-Gm-Message-State: AG10YOTAQzzWscgeJWTXV37HNMXNyyr+zXTyC2Lr7W7xu+NbHgRCvFZ4CMvANwnrLbpJEHy2rPa/us3S2R1usBzY X-Received: by 10.112.182.42 with SMTP id eb10mr2913593lbc.132.1453225426655; Tue, 19 Jan 2016 09:43:46 -0800 (PST) MIME-Version: 1.0 Received: by 10.112.77.227 with HTTP; Tue, 19 Jan 2016 09:43:26 -0800 (PST) In-Reply-To: References: From: Eric Evans Date: Tue, 19 Jan 2016 11:43:26 -0600 Message-ID: Subject: Re: Using cassandra a BLOB store / web cache. To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=001a11c36e3255ba0e0529b36985 --001a11c36e3255ba0e0529b36985 Content-Type: text/plain; charset=UTF-8 On Mon, Jan 18, 2016 at 8:52 PM, Kevin Burton wrote: > Internally we have the need for a blob store for web content. It's MOSTLY > key, ,value based but we'd like to have lookups by coarse grained tags. > > This needs to store normal web content like HTML , CSS, JPEG, SVG, etc. > > Highly doubt that anything over 5MB would need to be stored. > > We also need the ability to store older versions of the same URL for > features like "time travel" where we can see what the web looks like over > time. > > I initially wrote this for Elasticsearch (and it works well for that) but > it looks like binaries snuck into the set of requirements. > > I could Base64 encode/decode them in ES I guess but that seems ugly. > > I was thinking of porting this over to CS but I'm not up to date on the > current state of blobs in C*... > > Any advice? > We (Wikimedia Foundation) use Cassandra as a durable cache for HTML (with history). A simplified version of the schema we use would look something like: CREATE TABLE data ( key text, rev int, tid timeuuid, value blob, PRIMARY KEY (("_domain", key), rev, tid) ) In our case, a 'rev' represents a normative change to the document (read: someone made an edit), and the 'tid' attribute allows for some arbitrary number of HTML representations of that revision (say if for example some transclusion would alter the final outcome). You could simplify this further by removing the 'tid' attribute if this doesn't apply to you. One concern here is the size of blobs. Where exactly the threshold on size should be is probably debatable, but if you are using G1GC I would be careful about what large blobs do to humongous allocations. G1 will allocate anything over 1/2 the region size as humongous, and special-case the handling of them, so humongous allocations should be the exception and not the rule. Depending on your heap size and the distribution of blob sizes, you might be able to get by with overriding the GC's choice of region size, but if 5MB values are at all common, you'll need 16MB region sizes, (which probably won't work well without a very large corresponding max heap size). Another concern is row width. With a data-model like this, rows will grow relative to the number of versions stored. If versions are added at a low rate, that might not pose an issue in practice, if it does though you'll need to consider a different partitioning strategy. TL;DR You need to understand what your data will look like. Min and max value sizes aren't enough, you should have some idea of size distribution, read/write rates, etc. Understand the implications of your data model. And then test, test, test. -- Eric Evans eevans@wikimedia.org --001a11c36e3255ba0e0529b36985 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable

= On Mon, Jan 18, 2016 at 8:52 PM, Kevin Burton <burton@spinn3r.com>= wrote:
Internally we have the need for a blob store for web content.=C2= =A0 It's MOSTLY key, ,value based but we'd like to have lookups by = coarse grained tags.

This needs to store normal web cont= ent like HTML , CSS, JPEG, SVG, etc.

Highly doubt = that anything over 5MB would need to be stored.

We= also need the ability to store older versions of the same URL for features= like "time travel" where we can see what the web looks like over= time.

I initially wrote this for Elasticsearch (a= nd it works well for that) but it looks like binaries snuck into the set of= requirements. =C2=A0

I could Base64 encode/decode= them in ES I guess but that seems ugly. =C2=A0

I = was thinking of porting this over to CS but I'm not up to date on the c= urrent state of blobs in C*...

Any advice?

We (Wikimedia F= oundation) use Cassandra as a durable cache for HTML (with history).=C2=A0 = A simplified version of the schema we use would look something like:
CREATE TABLE data (
=C2=A0=C2=A0=C2=A0 key text,
=C2=A0=C2=A0=C2=A0 = rev int,
=C2=A0=C2=A0=C2=A0 tid timeuuid,
=C2=A0=C2=A0=C2=A0 value bl= ob,
=C2=A0=C2=A0=C2=A0 PRIMARY KEY (("_domain", key), rev, tid= )
)

In our case, a 'rev' represents a normative chan= ge to the document (read: someone made an edit), and the 'tid' attr= ibute allows for some arbitrary number of HTML representations of that revi= sion (say if for example some transclusion would alter the final outcome).= =C2=A0 You could simplify this further by removing the 'tid' attrib= ute if this doesn't apply to you.
<= br>
One concern here is the size of blobs.= =C2=A0 Where exactly the threshold on size should be is probably debatable,= but if you are using G1GC I would be careful about what large blobs do to = humongous allocations.=C2=A0 G1 will allocate anything over 1/2 the region = size as humongous, and special-case the handling of them, so humongous allo= cations should be the exception and not the rule.=C2=A0 Depending on your h= eap size and the distribution of blob sizes, you might be able to get by wi= th overriding the GC's choice of region size, but if 5MB values are at = all common, you'll need 16MB region sizes, (which probably won't wo= rk well without a very large corresponding max heap size).

Another concern is= row width.=C2=A0 With a data-model like this, rows will grow relative to t= he number of versions stored.=C2=A0 If versions are added at a low rate, th= at might not pose an issue in practice, if it does though you'll need t= o consider a different partitioning strategy.

TL;DR You need to understand what your data will look like.=C2= =A0 Min and max value sizes aren't enough, you should have some idea of= size distribution, read/write rates, etc.=C2=A0 Understand the implication= s of your data model.=C2=A0 And then test, test, test.


--
--001a11c36e3255ba0e0529b36985--