Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@couchdb.apache.org
Received-SPF: pass (nike.apache.org: domain of tim@alwaysreformed.com
 designates 74.55.86.74 as permitted sender)
Message-ID: <528D0D58.9090604@alwaysreformed.com>
Date: Wed, 20 Nov 2013 13:28:24 -0600
From: Tim Black <tim@alwaysreformed.com>
User-Agent: Mozilla/5.0 (X11; Linux i686;
 rv:24.0) Gecko/20100101 Thunderbird/24.1.0
MIME-Version: 1.0
To: user@couchdb.apache.org
Subject: Re: couchdb data structure
References: <em2a813d4e-fea3-48cb-b432-7b8a3ba0a2ef@mesh>
In-Reply-To: <em2a813d4e-fea3-48cb-b432-7b8a3ba0a2ef@mesh>
Content-Type: multipart/alternative;
 boundary="------------010500000208020802080207"

--------------010500000208020802080207
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit

John,

On 11/19/2013 04:28 AM, John Norris wrote:
> I am trying to retrofit an existing web app using SQL to couchdb with
> ektorp. I have setup couchdb and run through some tutorials (ektorp,
> seven databases in seven weeks, definitive guide).
> If I have several objects (represented as pojos) then in SQL this
> would probably equate to several tables within the database.
> But in couchdb, a database is a number of documents? And those
> documents can represent several object types? So do I need each
> document to have a field representing what type it is? (eg a field
> that is unique to that document type).
So far as I can understand your question, it depends on whether each
pojo object contains many rows of similar data.  If they don't then
represent each object as one doc, like this:

{ _id:"12345", type:"pojo" }

> Or does each document type go in its own database?
If each pojo object contains many rows of similar data, I'd probably
break it up into one document per row and keep all the pojos in the same
database, so I could query across all pojos.  I don't think it's
possible to query across multiple databases in CouchDB.

Here are two files I use to migrate data from sqlite to CouchDB, which I
offer here as an example for any who are doing similar work:

csv2json.py:

----------------
#!/usr/bin/env python

import csv, sys, json

# Open the file passed as a command line argument
f = open(sys.argv[1], 'r')
reader = csv.DictReader(f)
rows = []
for row in reader:
    for key in row.keys():
        # Remove underscore from beginning of attribute names
        if key.startswith('_'):
            new_key = key.lstrip('_')
            row[new_key] = row[key]
            del row[key]
        # Insert document collection column, which equals the sqlite
table name
        row['collection'] = sys.argv[2]
        # Convert id column to namespaced id to avoid conflicts
        if key == 'id':
            row['_id'] = sys.argv[2] + '.' + row['id']
            del row['id']
        if key == 'user_id':
            row['_id'] = sys.argv[2] + '.' + row['user_id']
            del row['user_id']
        if key == 'type':
            row['job'] = row['type']
            del row['type']
    rows.append(row)
# Wrap in CouchDB _bulk_docs JSON format
out = '{"docs":%s}' % json.dumps(rows)

print(out)
-----------------

sqlite2csv2couchdb.sh

------------------
#!/bin/bash

# Get the database from the production site
scp remote_host:path/to/sqlite.db .

DB="http://username:password@localhost:5984/projects"

# TODO: Use filtered replication to save the design docs
# Delete old copy of database
curl -X DELETE $DB
# Wait a second to let CouchDB delete the old database.
sleep 1
# Create new copy of database
curl -X PUT $DB

# TODO: Set permissions on couchdb database
# Create list of tables
tables=`sqlite3 devdata.db 'SELECT tbl_name FROM sqlite_master WHERE
type="table"'`

while read -r line;

do
    # Filter out the visits tables
    if [ "$line" != "visit" ] && [ "$line" != "visit_identity" ]
    then
        # Get table of data
        rows=$(sqlite3 -csv -header sqlite.db "SELECT * FROM $line")

        echo "$rows" > tmp.csv
        rows=$( python csv2json.py tmp.csv $line )
       
        # write JSON to file to avoid curl error of having too many
command line arguments
        echo "$rows" > tmp.json
       
        # Insert table into couchdb

        curl -d @tmp.json -H "Content-Type:application/json" -X POST
$DB/_bulk_docs &> /dev/null

    fi
done <<< "$tables"

rm tmp.json
rm tmp.csv
rm devdata.db
---------------------

Tim


--------------010500000208020802080207--