Return-Path: X-Original-To: apmail-couchdb-user-archive@www.apache.org Delivered-To: apmail-couchdb-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 491FE10C54 for ; Wed, 20 Nov 2013 19:28:56 +0000 (UTC) Received: (qmail 87250 invoked by uid 500); 20 Nov 2013 19:28:55 -0000 Delivered-To: apmail-couchdb-user-archive@couchdb.apache.org Received: (qmail 87207 invoked by uid 500); 20 Nov 2013 19:28:54 -0000 Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@couchdb.apache.org Delivered-To: mailing list user@couchdb.apache.org Received: (qmail 87193 invoked by uid 99); 20 Nov 2013 19:28:54 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 20 Nov 2013 19:28:54 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,SPF_PASS,WEIRD_PORT X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of tim@alwaysreformed.com designates 74.55.86.74 as permitted sender) Received: from [74.55.86.74] (HELO smtp.webfaction.com) (74.55.86.74) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 20 Nov 2013 19:28:47 +0000 Received: from [192.168.1.3] (unknown [108.67.115.10]) by smtp.webfaction.com (Postfix) with ESMTP id E47D720D6187 for ; Wed, 20 Nov 2013 19:28:25 +0000 (UTC) Message-ID: <528D0D58.9090604@alwaysreformed.com> Date: Wed, 20 Nov 2013 13:28:24 -0600 From: Tim Black User-Agent: Mozilla/5.0 (X11; Linux i686; rv:24.0) Gecko/20100101 Thunderbird/24.1.0 MIME-Version: 1.0 To: user@couchdb.apache.org Subject: Re: couchdb data structure References: In-Reply-To: X-Enigmail-Version: 1.6 Content-Type: multipart/alternative; boundary="------------010500000208020802080207" X-Virus-Checked: Checked by ClamAV on apache.org --------------010500000208020802080207 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit John, On 11/19/2013 04:28 AM, John Norris wrote: > I am trying to retrofit an existing web app using SQL to couchdb with > ektorp. I have setup couchdb and run through some tutorials (ektorp, > seven databases in seven weeks, definitive guide). > If I have several objects (represented as pojos) then in SQL this > would probably equate to several tables within the database. > But in couchdb, a database is a number of documents? And those > documents can represent several object types? So do I need each > document to have a field representing what type it is? (eg a field > that is unique to that document type). So far as I can understand your question, it depends on whether each pojo object contains many rows of similar data. If they don't then represent each object as one doc, like this: { _id:"12345", type:"pojo" } > Or does each document type go in its own database? If each pojo object contains many rows of similar data, I'd probably break it up into one document per row and keep all the pojos in the same database, so I could query across all pojos. I don't think it's possible to query across multiple databases in CouchDB. Here are two files I use to migrate data from sqlite to CouchDB, which I offer here as an example for any who are doing similar work: csv2json.py: ---------------- #!/usr/bin/env python import csv, sys, json # Open the file passed as a command line argument f = open(sys.argv[1], 'r') reader = csv.DictReader(f) rows = [] for row in reader: for key in row.keys(): # Remove underscore from beginning of attribute names if key.startswith('_'): new_key = key.lstrip('_') row[new_key] = row[key] del row[key] # Insert document collection column, which equals the sqlite table name row['collection'] = sys.argv[2] # Convert id column to namespaced id to avoid conflicts if key == 'id': row['_id'] = sys.argv[2] + '.' + row['id'] del row['id'] if key == 'user_id': row['_id'] = sys.argv[2] + '.' + row['user_id'] del row['user_id'] if key == 'type': row['job'] = row['type'] del row['type'] rows.append(row) # Wrap in CouchDB _bulk_docs JSON format out = '{"docs":%s}' % json.dumps(rows) print(out) ----------------- sqlite2csv2couchdb.sh ------------------ #!/bin/bash # Get the database from the production site scp remote_host:path/to/sqlite.db . DB="http://username:password@localhost:5984/projects" # TODO: Use filtered replication to save the design docs # Delete old copy of database curl -X DELETE $DB # Wait a second to let CouchDB delete the old database. sleep 1 # Create new copy of database curl -X PUT $DB # TODO: Set permissions on couchdb database # Create list of tables tables=`sqlite3 devdata.db 'SELECT tbl_name FROM sqlite_master WHERE type="table"'` while read -r line; do # Filter out the visits tables if [ "$line" != "visit" ] && [ "$line" != "visit_identity" ] then # Get table of data rows=$(sqlite3 -csv -header sqlite.db "SELECT * FROM $line") echo "$rows" > tmp.csv rows=$( python csv2json.py tmp.csv $line ) # write JSON to file to avoid curl error of having too many command line arguments echo "$rows" > tmp.json # Insert table into couchdb curl -d @tmp.json -H "Content-Type:application/json" -X POST $DB/_bulk_docs &> /dev/null fi done <<< "$tables" rm tmp.json rm tmp.csv rm devdata.db --------------------- Tim --------------010500000208020802080207--