couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Mitchell <monch1...@gmail.com>
Subject How to implement bulk loading with a "foreign key" involved?
Date Tue, 07 Apr 2009 00:54:23 GMT
Hello everyone,
I'm having trouble working out how to implement bulk loading of data in a
particular application I'm working on.

Making it as simple to understand as possible, I've got two types of
document that need to be stored:
- "accounts", which includes all the information about a user account (e.g.
name of the user, account ID, address, account creation date, ...)
- "transactions", which are tied to a specific account.  Transaction
information would include e.g. a transaction ID, account ID, transaction
date, ...  Importantly, I don't know which fields I'll receive in my
"transaction" records, so I need a schema-less storage model

If this was SQL, I'd have 2 separate tables, with a foreign key from each
"transactions" record pointing to a record in "accounts".  Nice and simple
for my SQL-trained brain to work with, but the data I have to work with is
inherently schema-less so a RDBMS isn't going to work.

With CouchDB, I've got this data going into a single database.  This seems
to be the accepted best practice, and makes sense for this specific
application.  This works fine now, but is taking too long - I can't keep up
with the rate of incoming data as long as I'm loading it in one record at a
time.

Assume the following naming conventions:
- A1 is account number 1, A2 is account number 2, ...
- T1A1 is the first transaction against account number 1, T3A4 is the third
transaction against account number 4

The data I'm loading may come in the following sequence: A1, T1A1, A2, T2A1,
A3, T3A1, T1A2, T1A3, A4, T2A2, ...  In other words, I'm receiving new
account data intermixed with new transaction data.  I'll never receive a
transaction for an account that doesn't already exist.  Again, nothing
unusual for a real life application.

I'd really like to be bulk-loading in the data, as the need to load it
quickly overrides all other requirements at this point.  However, as I
understand it, bulk loading the data will require that accounts already
exist for any transactions, and that's difficult giving the intermixing of
account and transaction data coming in.

One possibility is that I could conceivably force the end of a bulk load
"transaction" every time I see a new account number; doing that would ensure
that I'm never trying to generate a transaction against an account that
isn't already in the database.  However, I'm wondering if this is the best
way of dealing with this situation, which is presumably fairly common.

Any thoughts/ideas/suggestions welcome.

Thanks in advance

Dave M.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message