Return-Path: X-Original-To: apmail-couchdb-user-archive@www.apache.org Delivered-To: apmail-couchdb-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 8F04F106F1 for ; Wed, 27 Nov 2013 21:14:58 +0000 (UTC) Received: (qmail 9215 invoked by uid 500); 27 Nov 2013 21:14:57 -0000 Delivered-To: apmail-couchdb-user-archive@couchdb.apache.org Received: (qmail 9166 invoked by uid 500); 27 Nov 2013 21:14:56 -0000 Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@couchdb.apache.org Delivered-To: mailing list user@couchdb.apache.org Received: (qmail 9158 invoked by uid 99); 27 Nov 2013 21:14:55 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 27 Nov 2013 21:14:55 +0000 X-ASF-Spam-Status: No, hits=1.5 required=10 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy includes SPF record at spf.trusted-forwarder.org) Received: from [209.85.160.42] (HELO mail-pb0-f42.google.com) (209.85.160.42) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 27 Nov 2013 21:14:47 +0000 Received: by mail-pb0-f42.google.com with SMTP id uo5so11249168pbc.29 for ; Wed, 27 Nov 2013 13:14:26 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:from:date:message-id:subject:to :content-type; bh=SJlWW+ue5mxUdsUcG1SInjs4tAyNNoWaIc70PvPsotw=; b=gKTNjgy28SGzAW+s2VKl4Uot0ati+7FGyFiXb19DBzQ49OESCI3lbrWTcLWffNIKqL v4PuXB2nLp5DSwhuNt7BiBmCf7yULtLOjZi1/mnSTCQP5HBMluqeFX7+kJgFSL7sgTRr Y/8OOclNE1uT+pJWgB9Uk09vTyO6lt1CRaSXse700EYE44Qh2Ig6XAPk6PoW4nOiNbWC 2y9luXeKklluVRbeJEsW6hlkdP9OQn8o/DXKvtvOYGWzTh1nAvYcnNb8ZZ+idoDkZQz7 OqD3Cx3PhdOj9KvOGTu///lIukI3dSbjA8TukIzMPfbNtx1VUCsGkxkn1e+6jEwuWCTX rm7A== X-Gm-Message-State: ALoCoQn/VA0exFNkLn4W+yHq101bUNn9Z41Kv3/jMc9EQ0V6AdmDQMavUpqqQH/n+R3398V1Ch9R X-Received: by 10.66.65.108 with SMTP id w12mr39621748pas.84.1385586865868; Wed, 27 Nov 2013 13:14:25 -0800 (PST) MIME-Version: 1.0 Received: by 10.70.95.66 with HTTP; Wed, 27 Nov 2013 13:13:45 -0800 (PST) X-Originating-IP: [68.5.117.177] From: Mark Hahn Date: Wed, 27 Nov 2013 13:13:45 -0800 Message-ID: Subject: simple math/statistics problem using map reduce To: "couchdb-user@apache.org" Content-Type: multipart/alternative; boundary=001a1134b85af234bb04ec2f14d0 X-Virus-Checked: Checked by ClamAV on apache.org --001a1134b85af234bb04ec2f14d0 Content-Type: text/plain; charset=ISO-8859-1 I'm not an expert on statistics (and I'm lazy) so I thought I'd pose my problem here. Consider it a holiday mind exercise while avoiding relatives. I send customer-uploaded videos to Amazon Elastic Transcoder to generate a video for html5 consumption. It takes a few seconds up to tens of minutes to convert. I have no way to track progress so I estimate the time to complete and show a fake progress indicator. I have been using the run-time of the video and this is not working well at all. High bit-rate (big file) videos fare the worst. I'm guessing there are two main parameters to estimate the conversion time, the files size and run-time. The file size is a good estimate of input processing and run-rime is a good estimate of output processing. Amazon has been pretty consistent in their conversion times in the short-run. I have tons of data in my couchdb from previous conversions. I want to do regression analysis of these past runs to calculate parameters for estimation. I know the file-size, run-time, and conversion time for each. I will use runLen * A + fileSize * B as the estimation formula. A and B will be calculated by solving runLen@A + fileSize * B = convTime from the samples. It would be nice to use a map-reduce to always have the latest estimate of A and B, if possible. My first thought would be to just find the average for each of the three input vars and solve for A and B using these averages. However I'm pretty sure this would yield the wrong result because each set of three samples need to be used independently (not sure). So I would like to have each map take one conversion sample and do the regression in the reduce. Can someone give me pointers on how to do this? --001a1134b85af234bb04ec2f14d0--