Return-Path: Delivered-To: apmail-couchdb-user-archive@www.apache.org Received: (qmail 38705 invoked from network); 7 Apr 2009 01:49:10 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 7 Apr 2009 01:49:10 -0000 Received: (qmail 15888 invoked by uid 500); 7 Apr 2009 01:49:09 -0000 Delivered-To: apmail-couchdb-user-archive@couchdb.apache.org Received: (qmail 15808 invoked by uid 500); 7 Apr 2009 01:49:09 -0000 Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@couchdb.apache.org Delivered-To: mailing list user@couchdb.apache.org Received: (qmail 15798 invoked by uid 99); 7 Apr 2009 01:49:09 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 07 Apr 2009 01:49:09 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of sshumaker@gmail.com designates 209.85.217.162 as permitted sender) Received: from [209.85.217.162] (HELO mail-gx0-f162.google.com) (209.85.217.162) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 07 Apr 2009 01:49:02 +0000 Received: by gxk6 with SMTP id 6so5620401gxk.11 for ; Mon, 06 Apr 2009 18:48:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type :content-transfer-encoding; bh=+f178l+ho3Lrp586a5Fp5hZEEvUacQh8SqoEyL7rDMs=; b=RAAfOk7qp1MxnQ/jxRuNfHJzUM2/xLoivvF/qoM73sXJS6hUiLRA//ajmcjZqD+fQ5 xhlzK0LPnunc2mYnF4oSSaEWhq8/axKmnYSf0CvMVrDRHrSE5zU/Pf9uW3LbYXh94VLu cyn4EUOGZBzjFXC78eqn2N8gPLExOSWV2Xy84= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; b=C1VtiVRYOeF7JQIZKJ2+sePP9Xr9xMt1g/TqIRvLhRm2SK9UHFbvRAAkY6PIjw2Syp /aWHCzlZWAZK27+CTH3qK6kML3Tc2Zr4gpqoc5/rSHiYJC3WBK7Zu2/eByLgMdjdcbmo 0ZZSvQKT6bns2xSbidncPI5NdtP/el7dUNb4w= MIME-Version: 1.0 Received: by 10.150.143.5 with SMTP id q5mr8117470ybd.124.1239068921233; Mon, 06 Apr 2009 18:48:41 -0700 (PDT) In-Reply-To: References: Date: Mon, 6 Apr 2009 18:48:41 -0700 Message-ID: <261cf6280904061848w591c8d80o59bd4cf2790ee478@mail.gmail.com> Subject: Re: How to implement bulk loading with a "foreign key" involved? From: Scott Shumaker To: user@couchdb.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org Why do accounts already need to exist to create transactions? There's nothing in CouchDB that requires that. You can generate your own ids for these objects (as long as you have a nice deterministic way to do so) - and just reference the appropriate object. e.g. for an account object with accountId, insert: {type: "account", _id: "account-" + account.accountId, ...} and for a transaction referencing account A1, insert: {type: "transaction", accountId: "trans-" + A1} (and let couchDB generate the _id) etc. On Mon, Apr 6, 2009 at 5:54 PM, David Mitchell wrote: > Hello everyone, > I'm having trouble working out how to implement bulk loading of data in a > particular application I'm working on. > > Making it as simple to understand as possible, I've got two types of > document that need to be stored: > - "accounts", which includes all the information about a user account (e.= g. > name of the user, account ID, address, account creation date, ...) > - "transactions", which are tied to a specific account. =A0Transaction > information would include e.g. a transaction ID, account ID, transaction > date, ... =A0Importantly, I don't know which fields I'll receive in my > "transaction" records, so I need a schema-less storage model > > If this was SQL, I'd have 2 separate tables, with a foreign key from each > "transactions" record pointing to a record in "accounts". =A0Nice and sim= ple > for my SQL-trained brain to work with, but the data I have to work with i= s > inherently schema-less so a RDBMS isn't going to work. > > With CouchDB, I've got this data going into a single database. =A0This se= ems > to be the accepted best practice, and makes sense for this specific > application. =A0This works fine now, but is taking too long - I can't kee= p up > with the rate of incoming data as long as I'm loading it in one record at= a > time. > > Assume the following naming conventions: > - A1 is account number 1, A2 is account number 2, ... > - T1A1 is the first transaction against account number 1, T3A4 is the thi= rd > transaction against account number 4 > > The data I'm loading may come in the following sequence: A1, T1A1, A2, T2= A1, > A3, T3A1, T1A2, T1A3, A4, T2A2, ... =A0In other words, I'm receiving new > account data intermixed with new transaction data. =A0I'll never receive = a > transaction for an account that doesn't already exist. =A0Again, nothing > unusual for a real life application. > > I'd really like to be bulk-loading in the data, as the need to load it > quickly overrides all other requirements at this point. =A0However, as I > understand it, bulk loading the data will require that accounts already > exist for any transactions, and that's difficult giving the intermixing o= f > account and transaction data coming in. > > One possibility is that I could conceivably force the end of a bulk load > "transaction" every time I see a new account number; doing that would ens= ure > that I'm never trying to generate a transaction against an account that > isn't already in the database. =A0However, I'm wondering if this is the b= est > way of dealing with this situation, which is presumably fairly common. > > Any thoughts/ideas/suggestions welcome. > > Thanks in advance > > Dave M. >