couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexander Shorin <kxe...@gmail.com>
Subject Re: couchdb data structure
Date Thu, 21 Nov 2013 08:55:42 GMT
Hi Tim,

https://speakerdeck.com/wohali/10-common-misconceptions-about-apache-couchdb?slide=4
(;

I just agree with her. It doesn't simple to import sql data to nosql.
especially document-oriented, storage. Different conceptions and
requirements to data applied there.
--
,,,^..^,,,


On Wed, Nov 20, 2013 at 11:28 PM, Tim Black <tim@alwaysreformed.com> wrote:
> John,
>
> On 11/19/2013 04:28 AM, John Norris wrote:
>> I am trying to retrofit an existing web app using SQL to couchdb with
>> ektorp. I have setup couchdb and run through some tutorials (ektorp,
>> seven databases in seven weeks, definitive guide).
>> If I have several objects (represented as pojos) then in SQL this
>> would probably equate to several tables within the database.
>> But in couchdb, a database is a number of documents? And those
>> documents can represent several object types? So do I need each
>> document to have a field representing what type it is? (eg a field
>> that is unique to that document type).
> So far as I can understand your question, it depends on whether each
> pojo object contains many rows of similar data.  If they don't then
> represent each object as one doc, like this:
>
> { _id:"12345", type:"pojo" }
>
>> Or does each document type go in its own database?
> If each pojo object contains many rows of similar data, I'd probably
> break it up into one document per row and keep all the pojos in the same
> database, so I could query across all pojos.  I don't think it's
> possible to query across multiple databases in CouchDB.
>
> Here are two files I use to migrate data from sqlite to CouchDB, which I
> offer here as an example for any who are doing similar work:
>
> csv2json.py:
>
> ----------------
> #!/usr/bin/env python
>
> import csv, sys, json
>
> # Open the file passed as a command line argument
> f = open(sys.argv[1], 'r')
> reader = csv.DictReader(f)
> rows = []
> for row in reader:
>     for key in row.keys():
>         # Remove underscore from beginning of attribute names
>         if key.startswith('_'):
>             new_key = key.lstrip('_')
>             row[new_key] = row[key]
>             del row[key]
>         # Insert document collection column, which equals the sqlite
> table name
>         row['collection'] = sys.argv[2]
>         # Convert id column to namespaced id to avoid conflicts
>         if key == 'id':
>             row['_id'] = sys.argv[2] + '.' + row['id']
>             del row['id']
>         if key == 'user_id':
>             row['_id'] = sys.argv[2] + '.' + row['user_id']
>             del row['user_id']
>         if key == 'type':
>             row['job'] = row['type']
>             del row['type']
>     rows.append(row)
> # Wrap in CouchDB _bulk_docs JSON format
> out = '{"docs":%s}' % json.dumps(rows)
>
> print(out)
> -----------------
>
> sqlite2csv2couchdb.sh
>
> ------------------
> #!/bin/bash
>
> # Get the database from the production site
> scp remote_host:path/to/sqlite.db .
>
> DB="http://username:password@localhost:5984/projects"
>
> # TODO: Use filtered replication to save the design docs
> # Delete old copy of database
> curl -X DELETE $DB
> # Wait a second to let CouchDB delete the old database.
> sleep 1
> # Create new copy of database
> curl -X PUT $DB
>
> # TODO: Set permissions on couchdb database
> # Create list of tables
> tables=`sqlite3 devdata.db 'SELECT tbl_name FROM sqlite_master WHERE
> type="table"'`
>
> while read -r line;
>
> do
>     # Filter out the visits tables
>     if [ "$line" != "visit" ] && [ "$line" != "visit_identity" ]
>     then
>         # Get table of data
>         rows=$(sqlite3 -csv -header sqlite.db "SELECT * FROM $line")
>
>         echo "$rows" > tmp.csv
>         rows=$( python csv2json.py tmp.csv $line )
>
>         # write JSON to file to avoid curl error of having too many
> command line arguments
>         echo "$rows" > tmp.json
>
>         # Insert table into couchdb
>
>         curl -d @tmp.json -H "Content-Type:application/json" -X POST
> $DB/_bulk_docs &> /dev/null
>
>     fi
> done <<< "$tables"
>
> rm tmp.json
> rm tmp.csv
> rm devdata.db
> ---------------------
>
> Tim
>

Mime
View raw message