incubator-couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Samuel Williams <space.ship.travel...@gmail.com>
Subject Re: Data modelling
Date Thu, 12 Jul 2012 12:18:33 GMT
Thanks, what you've said makes a lot of sense.

So, w.r.t. invoices, I get the feeling that it would be best to store all
items in a single document as that avoids inconsistency related issues..?
It also feels right since you'd be highly unlikely to have more than, say,
50 items on an invoice. Finally, when you display an invoice you almost
always want to display all items, so requesting individual documents seems
less efficient.

But, with bank accounts, transactions are sort of documents in their own
right.. meaning that it is ideally best to keep them separate, and you
might not want to view them all, or you might keep a running total. I was
thinking to use statement objects which exist and keep track of credits,
debits and balance for a given period - along with individual transaction
documents.

I guess I'm just trying to figure out where and when to use a traditional
normalised approach ala typical SQL tables vs denormalized approach ala
typical document storage.

Kind regards,
Samuel

On 12 July 2012 23:07, Bernhard Gschwantner <bernhard@unserwein.at> wrote:

> Hi,
>
> That's more a question for the couchdb users list - I posted it also there.
> (not sure if that's the right way to do it, though)
>
> In the case of bank accounts and transactions, I would go for the second
> approach. CouchDB can happily handle millions of documents, and incremental
> map/reduce for views makes sure that unchanged transaction documents need
> not be calculated again.
>
> Account transactions are what the name says: transactions, and as such,
> they should be atomic. So if a transaction involves two accounts, then I
> would put both account numbers into the transaction document:
>
> {"_id": "transaction-key-1", ..., "bank-account-from":
> "bank-account-1", "bank-account-to": "bank-account-2"}
>
> With that construction, you can be sure that it will never happen that you
> deduct an amount from one account and never add it to the other. If you
> make this two transaction documents, It could be that you try to create
> both documents, but only one succeeds. Or if you replicate your database,
> the first one is replicated, and then the network connection goes down.
>
> Try to design your documents for eventual consistency,  so that you can
> avoid situations that could bring your database into an inconsistent state
> if some documents are missing.
>
>
> If you want to archive old transactions (they should never change anyway),
> you could do some processing and create account balances directly in the
> bank account documents (but keep in mind that this _could_ make your data
> inconsistent). One solution could be
> {"_id": "bank-account-1", ..., "cached-balance": 1234.56,
> "cached-balance-until": "2012-02-01T00:00:00.000Z"}
>
> Then you could set an "archived" flag on all transactions older than this
> date (if you have two accounts as proposed, you have to set two archive
> flags). You could then do a partial replication to a new database where you
> filter out any transactions that have the archived flag on both
> transactions, Tip: NEVER delete those transactions, not even if they are
> archived in another database. This WILL eventually lead to disaster ;-)
>
> Hope I could help!
>
> Cheers,
> Bernhard
>
> 2012/7/12 Samuel Williams <space.ship.traveller@gmail.com>
>
> > Hi,
> >
> > I need to model some relationships.
> >
> > (1) With a one-to-many relationship, should I keep the keys of the many
> > objects in the one, or for each many object have the key of the one, or
> > both?
> >
> > e.g.
> >
> > {"_id": "bank-account-1", ..., transactions: ["transaction-key-1",
> > "transaction-key-2", "transaction-key-3"]}
> > {"_id": "transaction-key-1", ...}
> > {"_id": "transaction-key-2", ...}
> > {"_id": "transaction-key-3", ...}
> >
> > OR
> >
> > {"_id": "bank-account-1", ...}
> > {"_id": "transaction-key-1", ..., "bank-account": "bank-account-1"}
> > {"_id": "transaction-key-2", ..., "bank-account": "bank-account-1"}
> > {"_id": "transaction-key-3", ..., "bank-account": "bank-account-1"}
> >
> > Both are a possibility. However, with the first approach, the document
> may
> > get very large. This leads into my next question:
> >
> > (2) W.r.t. denormalizing data - sometimes it naturally makes sense, e.g.
> > invoice with multiple line items stored in an array vs invoice with
> > multiple line item documents.
> >
> > With an invoice, once you make the invoice it doesn't usually change that
> > much and you also want the data together when you access it, e.g. to
> > display, however with a bank account the transactions are changing
> > frequently (e.g. new transactions every day).
> >
> > I was wondering if I should consider batching together transactions,
> e.g. a
> > document for each month's worth of transactions, or perhaps one per year,
> > etc. That way when modifying transactions, you don't need to specify the
> > entire huge document with 1000s of individual items.
> >
> > I would appreciate any feedback and suggestions/advice.
> >
> > Thanks
> > Samuel
> >
>
>
>
> --
>
> Bernhard Gschwantner
> Unser Wein G&U OG
> Kirchengasse 13/7, 1070 Wien
>
> mobil: +43 (6991) 971 32 96
> tel: +43 (1) 971 32 95
> e-mail: bernhard@unserwein.at
> twitter: @bernharduw <http://twitter.com/bernharduw>
> web: www.unserwein.at
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message