incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Scott Shumaker <sshuma...@gmail.com>
Subject Re: How to implement bulk loading with a "foreign key" involved?
Date Tue, 07 Apr 2009 01:48:41 GMT
Why do accounts already need to exist to create transactions?  There's
nothing in CouchDB that requires that.  You can generate your own ids
for these objects (as long as you have a nice deterministic way to do
so) - and just reference the appropriate object.

e.g.
for an account object with accountId, insert:
{type: "account", _id: "account-" + account.accountId, ...}

and for a transaction referencing account A1, insert:
{type: "transaction", accountId: "trans-" + A1}  (and let couchDB
generate the _id)

etc.


On Mon, Apr 6, 2009 at 5:54 PM, David Mitchell <monch1962@gmail.com> wrote:
> Hello everyone,
> I'm having trouble working out how to implement bulk loading of data in a
> particular application I'm working on.
>
> Making it as simple to understand as possible, I've got two types of
> document that need to be stored:
> - "accounts", which includes all the information about a user account (e.g.
> name of the user, account ID, address, account creation date, ...)
> - "transactions", which are tied to a specific account.  Transaction
> information would include e.g. a transaction ID, account ID, transaction
> date, ...  Importantly, I don't know which fields I'll receive in my
> "transaction" records, so I need a schema-less storage model
>
> If this was SQL, I'd have 2 separate tables, with a foreign key from each
> "transactions" record pointing to a record in "accounts".  Nice and simple
> for my SQL-trained brain to work with, but the data I have to work with is
> inherently schema-less so a RDBMS isn't going to work.
>
> With CouchDB, I've got this data going into a single database.  This seems
> to be the accepted best practice, and makes sense for this specific
> application.  This works fine now, but is taking too long - I can't keep up
> with the rate of incoming data as long as I'm loading it in one record at a
> time.
>
> Assume the following naming conventions:
> - A1 is account number 1, A2 is account number 2, ...
> - T1A1 is the first transaction against account number 1, T3A4 is the third
> transaction against account number 4
>
> The data I'm loading may come in the following sequence: A1, T1A1, A2, T2A1,
> A3, T3A1, T1A2, T1A3, A4, T2A2, ...  In other words, I'm receiving new
> account data intermixed with new transaction data.  I'll never receive a
> transaction for an account that doesn't already exist.  Again, nothing
> unusual for a real life application.
>
> I'd really like to be bulk-loading in the data, as the need to load it
> quickly overrides all other requirements at this point.  However, as I
> understand it, bulk loading the data will require that accounts already
> exist for any transactions, and that's difficult giving the intermixing of
> account and transaction data coming in.
>
> One possibility is that I could conceivably force the end of a bulk load
> "transaction" every time I see a new account number; doing that would ensure
> that I'm never trying to generate a transaction against an account that
> isn't already in the database.  However, I'm wondering if this is the best
> way of dealing with this situation, which is presumably fairly common.
>
> Any thoughts/ideas/suggestions welcome.
>
> Thanks in advance
>
> Dave M.
>

Mime
View raw message