Return-Path: X-Original-To: apmail-couchdb-user-archive@www.apache.org Delivered-To: apmail-couchdb-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 9126E10803 for ; Mon, 11 Nov 2013 15:19:41 +0000 (UTC) Received: (qmail 83648 invoked by uid 500); 11 Nov 2013 15:19:37 -0000 Delivered-To: apmail-couchdb-user-archive@couchdb.apache.org Received: (qmail 83519 invoked by uid 500); 11 Nov 2013 15:19:37 -0000 Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@couchdb.apache.org Delivered-To: mailing list user@couchdb.apache.org Received: (qmail 83496 invoked by uid 99); 11 Nov 2013 15:19:36 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 11 Nov 2013 15:19:36 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of simon@cloudant.com designates 74.125.82.172 as permitted sender) Received: from [74.125.82.172] (HELO mail-we0-f172.google.com) (74.125.82.172) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 11 Nov 2013 15:19:32 +0000 Received: by mail-we0-f172.google.com with SMTP id q58so4673655wes.17 for ; Mon, 11 Nov 2013 07:19:10 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:date:from:to:message-id:in-reply-to:references :subject:mime-version:content-type:content-transfer-encoding :content-disposition; bh=aGj7VaAJ4rKLdWOu4RKYXxXMW7nAO1ZkBJwKz2ek9lA=; b=kPCOhwcJPXDgj/C8rHOmd5gA46bwvpCPjt69uWplekDPMFAULp/GJFH5fYYYcMrXQO FTzKychJ4Y63T3lYaJYnufJ3qMF1+Mj1LEKTJi+KCXSUGIdvVI8OsCxGhnHf5K5Ofvop zRMRJDgKG8Ta+9kE+G+EUumDr84IfA+SqAzvMLm+28qdd+QNVuvUYwDWI1PCyJ8eGANm TKUha+qEX8aLVeGRhjaCIMWZf+c2tYP/WQSI7kcfC9m43EfuUqv3a8trJBTjgeZLnr0E NrkW/P0XHQVWT2JhJgBDoGYLJUzvq+q4whQyzwwEzw2IrnNxtCHJPElhEVZHVyVajc3a qI+w== X-Gm-Message-State: ALoCoQkCHhqJW5iAqs3IoA+pm7ZaFlIn1WpNec5hb+yJ7vF86u7si9A3BAA8AgfU2LX9MtpCdY+D X-Received: by 10.180.37.114 with SMTP id x18mr12684859wij.64.1384183150615; Mon, 11 Nov 2013 07:19:10 -0800 (PST) Received: from [192.168.1.101] (109-69.82-183.hns.net. [109.69.82.183]) by mx.google.com with ESMTPSA id q3sm35114430wib.5.2013.11.11.07.19.09 for (version=TLSv1 cipher=RC4-SHA bits=128/128); Mon, 11 Nov 2013 07:19:09 -0800 (PST) Date: Mon, 11 Nov 2013 15:19:07 +0000 From: Simon Metson To: user@couchdb.apache.org Message-ID: <56EA484C90F04E15B3D5F89581E8A6A6@cloudant.com> In-Reply-To: References: <527E548F.4020203@yandex.ru> <1CD5C733-4854-407A-A8DF-8E29A1A0493B@couchbase.com> Subject: Re: Storage limitations? X-Mailer: sparrow 1.6.4 (build 1178) MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Content-Disposition: inline X-Virus-Checked: Checked by ClamAV on apache.org I think if you can happily fit it all into CouchDB (and from the numbers = you=E2=80=99ve mentioned so far I=E2=80=99m confident you can) I=E2=80=99= d go with that - simplest solution. If your 1000=E2=80=99s become 1000000= =E2=80=99s then you can bite the bullet and switch to metadata in CouchDB= , image data in something else. =20 On Monday, 11 November 2013 at 15:16, Mark Deibert wrote: > =40Simon: I understand what your saying about =22source of truth=22. Wh= at is > master record, the files or the Couch records if the upload succeeds an= d > the Couch record write fails or vise-versa. Not sure what the answer is= > here either. > =20 > =20 > On Mon, Nov 11, 2013 at 10:00 AM, Mark Deibert wrote: > =20 > > =40Simon: Thanks for the advice. As I type, I don't anticipate any > > aggregation on the Photo docs, but I should think about that some mor= e. > > =20 > > =20 > > On Mon, Nov 11, 2013 at 9:56 AM, Simon Metson wrote: > > =20 > > > Hey, > > > I=E2=80=99d go with what you=E2=80=99re doing (1:1 doc:photo). > > > =20 > > > 2 seems a bit weird - do you mean host the app in a different serve= r to > > > the data or have different databases for different types of data (t= hat > > > might work, if you never want to query across types of data)=3F 3 m= ight work > > > so long as you never want to aggregate across the categories. 4 is = a > > > reasonable approach for very very large attachments, but you can ge= t into > > > consistency issues - what=E2=80=99s the source of truth the databas= e or the > > > filesystem=3F > > > Cheers > > > Simon > > > =20 > > > =20 > > > On Monday, 11 November 2013 at 14:41, Mark Deibert wrote: > > > =20 > > > > A followup on the =221000's of images=22 question. I could approa= ch this a > > > > couple ways. Currently each image is attached to it's own Photo d= oc. > > > =20 > > > =20 > > > Which > > > > I've read is better for replication than one attachment with many= > > > > attachments. So that's fine, but will Couch have any issue managi= ng > > > =20 > > > =20 > > > several > > > > thousand of these Photo docs, each with a 3MB'ish image attachmen= t=3F If > > > =20 > > > =20 > > > you > > > > were building this Couchapp, would you... > > > > =20 > > > > 1) Keep the photos as described above in one CouchDB > > > > 2) Move the Photo docs with attachments out into a separate Couch= DB > > > > 3) Do 2, but break Photos into multiple categorized CouchDBs > > > > 4) Upload the images to the filesystem, just store the link in Co= uch > > > > =20 > > > > I want to build this Couchapp in such a way as to not make life > > > miserable > > > > for CouchDB :-D > > > > =20 > > > > =20 > > > > =20 > > > > =20 > > > > On Sun, Nov 10, 2013 at 6:33 PM, Dave Cottlehuber > > dch=40jsonified.com (mailto:dch=40jsonified.com))> wrote: > > > > =20 > > > > > On 10 November 2013 23:14, Mark Deibert > > mark.deibert=40gmail.com (mailto:mark.deibert=40gmail.com))> wrote:= > > > > > > I read an article somewhere that using include=5Fdocs is =22h= ard=22 on > > > > > =20 > > > > =20 > > > =20 > > > =20 > > > memory > > > > > =20 > > > > > =20 > > > > > or > > > > > > disk or in some way taxes Couch and therefore you should just= emit > > > > > =20 > > > > =20 > > > =20 > > > =20 > > > the > > > > > =20 > > > > > =20 > > > > > doc. > > > > > > Is this true=3F > > > > > =20 > > > > > =20 > > > > > =20 > > > > > =20 > > > > > =20 > > > > > Like most general statements it has some truth and some lies in= it :-) > > > > > =20 > > > > > views and docs are stored in separate .couch btree files on dis= k. > > > > > =20 > > > > > emit(key, doc) puts a full copy of the doc (that's already in t= he doc > > > > > .couch b=7Etree) into the view b=7Etree. > > > > > =20 > > > > > advantage - no need to hop over to the doc .couch file to retri= eve the > > > > > document. > > > > > disadvantage - you now have 2 copies of the doc in separate fil= es, > > > > =20 > > > =20 > > > =20 > > > wasted > > > > > space. > > > > > =20 > > > > > If you do things right, and your app fits this model, the gener= ated > > > > > etags from views and docs can be cached in nginx or similar, an= d > > > > > repeated queries don't need to hit your couch. > > > > > =20 > > > > > So yes, include=5Fdocs means extra reads, but like most of thes= e things > > > > > you should benchmark your situation, under a realistic load, no= t just > > > > > pumping 1000 single-doc reads at it. > > > > > =20 > > > > > A+ > > > > > Dave > > > > =20 > > > =20 > > =20 > =20