Return-Path: X-Original-To: apmail-couchdb-user-archive@www.apache.org Delivered-To: apmail-couchdb-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E36199669 for ; Tue, 10 Apr 2012 04:47:21 +0000 (UTC) Received: (qmail 62025 invoked by uid 500); 10 Apr 2012 04:47:20 -0000 Delivered-To: apmail-couchdb-user-archive@couchdb.apache.org Received: (qmail 61732 invoked by uid 500); 10 Apr 2012 04:47:15 -0000 Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@couchdb.apache.org Delivered-To: mailing list user@couchdb.apache.org Received: (qmail 61681 invoked by uid 99); 10 Apr 2012 04:47:13 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 10 Apr 2012 04:47:13 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of rizalp@gmail.com designates 209.85.210.180 as permitted sender) Received: from [209.85.210.180] (HELO mail-iy0-f180.google.com) (209.85.210.180) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 10 Apr 2012 04:47:09 +0000 Received: by iage36 with SMTP id e36so9620251iag.11 for ; Mon, 09 Apr 2012 21:46:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:x-goomoji-body:date:message-id :subject:from:to:content-type; bh=TeOIMV2Z4XikRas4nzBYEiViRaq21+xs5aB07WpKWOE=; b=aLv7CQOVPcUEUYSa/LM3fwKjgjuiR2YFljaZIwngG9VcMfyWBrRzYvMqkm2xMW5xXz UJygX474OzKSr6mP5vZbmLyWDLOjcsLxa9qCv2EyJCJUFjaPHWI8/2lwam8xHKK+KhQf zbdxB19VblIt6TqW40RqwmP7vqUNDZJdMVXYOTMEFkoKholmOV5HkAy02yqF0L2VInZx /kYHtyTca/teGjTgzO8ZQwtF5K7m+lkEhUJCfMIZNLx0FC9dsLn5YvxZ3VGtNlhR6PHZ vPFM3TlW7RYUx+i0po4SHsTKHxHMYZEYQ4qb/E746OzAHunNX9YxIxB8+7p7UiRB+/49 xNsg== MIME-Version: 1.0 Received: by 10.50.192.168 with SMTP id hh8mr1075853igc.13.1334033208667; Mon, 09 Apr 2012 21:46:48 -0700 (PDT) Received: by 10.43.135.10 with HTTP; Mon, 9 Apr 2012 21:46:48 -0700 (PDT) In-Reply-To: References: X-Goomoji-Body: true Date: Tue, 10 Apr 2012 11:46:48 +0700 Message-ID: Subject: Re: Schema Design when migrating data from relational into document From: Mohammad Prabowo To: user@couchdb.apache.org Content-Type: multipart/related; boundary=e89a8f13ec64856dea04bd4bcf82 X-Virus-Checked: Checked by ClamAV on apache.org --e89a8f13ec64856dea04bd4bcf82 Content-Type: multipart/alternative; boundary=e89a8f13ec64856de804bd4bcf81 --e89a8f13ec64856de804bd4bcf81 Content-Type: text/plain; charset=ISO-8859-1 Thanks Andrew for bringing that into consideration [?] On Tue, Apr 10, 2012 at 7:34 AM, Andrew Woodcock wrote: > Bear in mind as well how often you will be updating certain information: > each update creates a document revision, so a large document where a couple > of fields (or even just one) are frequently updated can lead to increased > storage requirements and will also impact replication: there will be > frequent replication of a large document where only a small part is > actually changing. In a scenario like that, it may well be better to have > the frequently updating field(s) in separate documents. > > Regards, > > Andrew > > On 9 April 2012 17:26, Mohammad Prabowo wrote: > > > Thanks! I had read somewhere that there is a tradeoff between embedding > the > > data (example 1) or more normalized document (example 2). It's more of a > > choice between data locality, disk space, and querying flexibilities. I > > guess since every query must go trough views, the speed benefits of data > > locality is therefore reduced > > > > On Mon, Apr 9, 2012 at 9:01 PM, Keith Gable > >wrote: > > > > > I'd go the first route, but salaries and titles should be arrays of > > hashes: > > > > > > "titles": [ > > > { "name": "xxx", "from": "xxx", "to": "xxx" } > > > ] > > > > > > If you want to decouple the data, like if you wanted a list of all > > titles, > > > you'd use CouchDB views. > > > On Apr 9, 2012 6:07 AM, "Mohammad Prabowo" wrote: > > > > > > > Hi, suppose i have relational db with schema like this > > > > > > > > employees-schema< > > > > http://dev.mysql.com/doc/employee/en/images/employees-schema.png> > > > > > > > > I want to try converting it into document. I have two question: > > > > > > > > 1. The main strength of Document is that it is 'self contained'. > > > Meaning > > > > we don't need to do JOIN stuff, and all data that is needed are > > > contained > > > > within documents. So, should i choose to use nested documents like > > > this : > > > > > > > > { > > > > "emp_no": "...", > > > > "birth_date": "...", > > > > "first_name": "..", > > > > "last_name": "..", > > > > "gender": "..", > > > > "hire_date": "..", > > > > "titles": { > > > > "title": "...", > > > > "from_date": "...", > > > > "to_date": "..." > > > > }, > > > > "salaries": { > > > > "salary": "...", > > > > "from_date": "...", > > > > "to_date": "..." > > > > } > > > > } > > > > > > > > > > > > or using different documents like this : > > > > > > > > [ > > > > { > > > > "doc_name": "employees", > > > > "emp_no": "...", > > > > "birth_date": "...", > > > > "first_name": "..", > > > > "last_name": "..", > > > > "gender": "..", > > > > "hire_date": ".." > > > > }, > > > > { > > > > "doc_name": "titles", > > > > "from_date": "...", > > > > "to_date": "..." > > > > }, > > > > { > > > > "doc_name": "salaries", > > > > "salary": "...", > > > > "from_date": "...", > > > > "to_date": "..." > > > > } > > > > ] > > > > > > > > > > > > 2. I want to benchmark MySQL and CouchDB with > > > > YCSB. > > > > Is there are db layer that has been built for CouchDB ? > > > > > > > > Thanks in advance > > > > > > > > > > --e89a8f13ec64856de804bd4bcf81 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Thanks Andrew for bringing that into consideration =A0
On Tue, Apr 10, 2012 at 7:34 AM, Andrew Woodcock <andywoodcock11@gmail.com> wrote:
Bear in mind as well how often you will be updating certain information: each update creates a document revision, so a large document where a couple=
of fields (or even just one) are frequently updated can lead to increased storage requirements and will also impact replication: there will be
frequent replication of a large document where only a small part is
actually changing. In a scenario like that, it may well be better to have the frequently updating field(s) in separate documents.

Regards,

Andrew

On 9 April 2012 17:26, Mohammad Prabowo <rizalp@gmail.com> wrote:

> Thanks! I had read somewhere that there is a tradeoff between embeddin= g the
> data (example 1) or more normalized document (example 2). It's mor= e of a
> choice between data locality, disk space, and querying flexibilities. = I
> guess since every query must go trough views, the speed benefits of da= ta
> locality is therefore reduced
>
> On Mon, Apr 9, 2012 at 9:01 PM, Keith Gable <ziggy@ignition-project.com
> >wrote:
>
> > I'd go the first route, but salaries and titles should be arr= ays of
> hashes:
> >
> > "titles": [
> > { "name": "xxx", "from": "xxx&= quot;, "to": "xxx" }
> > ]
> >
> > If you want to decouple the data, like if you wanted a list of al= l
> titles,
> > you'd use CouchDB views.
> > On Apr 9, 2012 6:07 AM, "Mohammad Prabowo" <rizalp@gmail.com> wrote:
> >
> > > Hi, suppose i have relational db with schema like this
> > >
> > > employees-schema<
> > > http://dev.mysql.com/doc/employee/en/ima= ges/employees-schema.png>
> > >
> > > I want to try converting it into document. I have two questi= on:
> > >
> > > =A0 1. The main strength of Document is that it is 'self= contained'.
> > Meaning
> > > =A0 we don't need to do JOIN stuff, and all data that is= needed are
> > contained
> > > =A0 within documents. So, should i choose to use nested docu= ments like
> > this :
> > >
> > > =A0 {
> > > =A0 =A0 =A0 "emp_no": "...",
> > > =A0 =A0 =A0 "birth_date": "...",
> > > =A0 =A0 =A0 "first_name": "..",
> > > =A0 =A0 =A0 "last_name": "..",
> > > =A0 =A0 =A0 "gender": "..",
> > > =A0 =A0 =A0 "hire_date": "..",
> > > =A0 =A0 =A0 "titles": {
> > > =A0 =A0 =A0 =A0 =A0 "title": "...",
> > > =A0 =A0 =A0 =A0 =A0 "from_date": "...",<= br> > > > =A0 =A0 =A0 =A0 =A0 "to_date": "..."
> > > =A0 =A0 =A0 },
> > > =A0 =A0 =A0 "salaries": {
> > > =A0 =A0 =A0 =A0 =A0 "salary": "...",
> > > =A0 =A0 =A0 =A0 =A0 "from_date": "...",<= br> > > > =A0 =A0 =A0 =A0 =A0 "to_date": "..."
> > > =A0 =A0 =A0 }
> > > =A0 }
> > >
> > >
> > > =A0 or using different documents like this :
> > >
> > > =A0 [
> > > =A0 =A0 =A0 {
> > > =A0 =A0 =A0 =A0 =A0 "doc_name": "employees&qu= ot;,
> > > =A0 =A0 =A0 =A0 =A0 "emp_no": "...",
> > > =A0 =A0 =A0 =A0 =A0 "birth_date": "...",=
> > > =A0 =A0 =A0 =A0 =A0 "first_name": "..",<= br> > > > =A0 =A0 =A0 =A0 =A0 "last_name": "..", > > > =A0 =A0 =A0 =A0 =A0 "gender": "..",
> > > =A0 =A0 =A0 =A0 =A0 "hire_date": ".." > > > =A0 =A0 =A0 },
> > > =A0 =A0 =A0 {
> > > =A0 =A0 =A0 =A0 =A0 "doc_name": "titles"= ,
> > > =A0 =A0 =A0 =A0 =A0 "from_date": "...",<= br> > > > =A0 =A0 =A0 =A0 =A0 "to_date": "..."
> > > =A0 =A0 =A0 },
> > > =A0 =A0 =A0 {
> > > =A0 =A0 =A0 =A0 =A0 "doc_name": "salaries&quo= t;,
> > > =A0 =A0 =A0 =A0 =A0 "salary": "...",
> > > =A0 =A0 =A0 =A0 =A0 "from_date": "...",<= br> > > > =A0 =A0 =A0 =A0 =A0 "to_date": "..."
> > > =A0 =A0 =A0 }
> > > =A0 ]
> > >
> > >
> > > =A0 2. I want to benchmark MySQL and CouchDB with
> > > YCSB<https://github.com/brianfrankcooper/YCSB/wiki&g= t;.
> > > =A0 Is there are db layer that has been built for CouchDB ?<= br> > > >
> > > Thanks in advance
> > >
> >
>

--e89a8f13ec64856de804bd4bcf81-- --e89a8f13ec64856dea04bd4bcf82--