incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cliff Williams <cliffywi...@aol.com>
Subject Re: Uploading CSV data to Couchapp
Date Mon, 04 Apr 2011 11:23:23 GMT
David,

This is my "snippet" for consuming a continuous _changes feed using 
node. Take a look and and see what you think. We can take it offline to 
discuss in detail if you wish.

Best regards

cliff

var
   sys = require("sys"),
   http = require('http'),
   events = require("events"),
   host = "10.0.0.10",
   port = 5984,
   database="test",
   changesurl = "http://"+ host +":"+ port.toString() +"/"+database+ 
"/_changes?feed=continuous&include_docs=true&heartbeat=10000",
   infourl =  "http://"+ host +":"+port.toString()+"/"+database,
   lastsequence=0,
   stream = new process.EventEmitter(),
   timeout=0;

     sys.puts(changesurl);
// set up connection
     couchdbconnection = http.createClient(port, host);
// handlers to monitor the actual connection
     couchdbconnection.addListener('close', function() {
         couchdbconnection.destroy();
       return stream.emit('seq', info.update_seq);
     });
     couchdbconnection.addListener('error',function() {
         console.log("couchdbconnection listener ....error");
     });

// Send request
         sys.puts(infourl);
         request = couchdbconnection.request('GET', infourl);
         request.end();

// Listen for any responses
     request.addListener("response", function(res) {
/* "data" gets triggered on receipt of line end information so no need 
to do
  any special buffering ...... or at least I dont think so */
     res.addListener('data', function(infodata) {

// info comes in as a string
         info = JSON.parse(infodata);
      });
     });

// Wait until the code used to get the last sequence number completes
     stream.addListener('seq',function (lastseq) {
// set up connection
     couchdbconnection = http.createClient(port, host);
     couchdbconnection.setTimeout(timeout);
// handlers to monitor the actual connection
     couchdbconnection.addListener('close', function() {
     console.log("couchdbconnection Listener .....connection closed");
   });
     couchdbconnection.addListener('error',function() {
         sys.puts("couchdbconnection listener ....error");
     });
// Send request
     request = couchdbconnection.request('GET', 
changesurl+"&since="+lastseq);
     request.end();

// Listen for any responses
     request.addListener("response", function(res) {
/* "data" gets triggered on receipt of line end information so no need 
to do
  any special buffering */
     res.addListener('data', function(changedata) {

// Couch sends an empty line as the "heartbeat"
         if (changedata == "\n") {
                   console.log("heartbeat");
                     }
                     else {
// info comes in as a string so create JSON object to add additional info
                     changes=JSON.parse(changedata);
                     changes.database=database;
                     changedata=JSON.stringify(changes);
                     console.log("Actual data Changed "+ changedata);

               };});});});


On 04/04/11 11:50, David Mitchell wrote:
> Hi Cliff,
>
> What you're describing doing using Node.js sounds exactly like what I 
> want to do - process the uploads transparently in the background, 
> entirely within the context of my Couchapp.
>
> I've searched around for info on using Node.js and CouchDB like this, 
> but only found links that describe the technique in very broad detail. 
>  Do you know of any links that describe it in reasonable detail?
>
> I'm an experienced Python/Ruby/C/C#/... coder and an occasional Erlang 
> coder, but stuff like Node.js is completely new to me - I'm assuming 
> that if I could see how to do this sort of thing in a fairly concise 
> example, it'd trigger the "aha" moment in my brain and then I'd be off 
> and running...
>
> Thanks again
>
> Dave M.
>
> On 4 April 2011 19:34, Cliff Williams <cliffywills@aol.com 
> <mailto:cliffywills@aol.com>> wrote:
>
>     David,
>
>     I hope you are well.
>
>     I think that you have covered your options pretty well.
>
>
>     "- upload the data&  save it into a single "uploaded_csv" document in
>     CouchDB.  Within CouchDB, detect the presence of a new "uploaded_csv"
>     document, extract and process the content using Javascript and
>     save it into
>     multiple "data" records, with appropriate indexing, then dispose
>     of the
>     "uploaded_csv" document or mark it as "processed".  This seems
>     reasonably
>     straightforward, but I'm not sure how to detect the presence of a new
>     "uploaded_csv" document"
>
>     This is the approach that I would take.
>
>     Couchdb has a quite excellent _changes feed which will notify you
>     (or can be set up to notify) in real time on any changes made to
>     specific databases.
>
>     I personally would use Node.js to monitor the changes feed and
>     process your csv files (Javascript and very fast) but you could of
>     course use anything (erlang python (Ruby's CSV processing
>     libraries are also quite good)).
>
>     best regards
>
>     Cliff
>
>     On 04/04/11 08:58, David Mitchell wrote:
>
>         Hello all,
>
>         I'm just about to start on my first (wildly ambitious)
>         Couchapp.  I've had
>         quite a bit of Erlang experience, but not for the past couple
>         of years so
>         I'm a bit rusty.  I've had a tiny bit of experience with
>         CouchDB via various
>         Python scripts, but that's all been treating CouchDB as a
>         "black box"
>         database so I've currently got little knowledge of what it can
>         do beyond
>         being a document datastore.
>
>         Initially, I'm trying to understand my options for uploading
>         CSV files,
>         parsing out the content and storing them in CouchDB (one
>         CouchDB record per
>         line of CSV content).  While it's reasonably straightforward
>         to do this if I
>         was using e.g. Python as a batch load tool, I don't want to go
>         outside
>         Javascript for this project if I can avoid it.  The CSV files
>         are anywhere
>         from 1k-30k records, with 8-10 fields in each that are
>         straightforward
>         timestamps and floating point numbers.
>
>         For an old-school Web app with distinct database and app
>         server layers,
>         there's a straightforward option - upload the data to a file
>         on the web
>         server, then process the data out of the file and load it into
>         your
>         database.  Sure there's variations on this approach such as
>         saving data as a
>         database blob, but I'm looking for the best CouchApp-specific
>         approach if
>         one exists.
>
>         Options I can see:
>         - upload the data&  save it into a single "uploaded_csv"
>         document in
>         CouchDB.  Within CouchDB, detect the presence of a new
>         "uploaded_csv"
>         document, extract and process the content using Javascript and
>         save it into
>         multiple "data" records, with appropriate indexing, then
>         dispose of the
>         "uploaded_csv" document or mark it as "processed".  This seems
>         reasonably
>         straightforward, but I'm not sure how to detect the presence
>         of a new
>         "uploaded_csv" document (is there a cron equivalent in Couch?)
>         and I'd have
>         to track the progress of processing each uploaded CSV file to
>         detect when
>         they've been processed into "data" records
>         - upload the data&  save it into a single "uploaded_csv"
>         document in
>         CouchDB.  Have CouchDB running embedded in an Erlang app, and
>         use Erlang to
>         read the "uploaded_csv" data, then send a series of e.g. HTTP
>         PUTs to load
>         the data into multiple "data" records in CouchDB.  This just
>         seems ugly to
>         me, but I'm pretty confident I could get it working pretty easily
>         - upload the data and process it directly into "data" records
>         from a web
>         page served from CouchApp.  This seems like it could impact on
>         scalability
>         due to having long-running connections between client and
>         server, but at
>         least a user would know when their data has been uploaded and
>         processed
>         successfully with trivial extra work on my part
>         - upload the data, convert it to JSON on the client using
>         clientside
>         Javascript, then send it as a set of document uploads (i.e.
>         one document per
>         CSV record) from the client to the Couch server.  This would
>         let me parse
>         out any bogus CSV content without sending it to the server,
>         but I'll have
>         users running browsers on mobile devices and I'm not sure I
>         want to put that
>         processing load onto the client
>
>         Are there any "recommended" approaches for this type of task?
>          I suspect
>         this question and others I'll ask have probably already been
>         considered and
>         dealt with by various experts; if there's a "CouchApp
>         cookbook" with
>         recommended solutions for these and other common situations,
>         I'd appreciate
>         a pointer to it so I could start to answer my own questions.
>
>         Thanks in advance
>
>         Dave M.
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message