Return-Path: X-Original-To: apmail-couchdb-user-archive@www.apache.org Delivered-To: apmail-couchdb-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E0EB19317 for ; Tue, 14 Feb 2012 01:19:15 +0000 (UTC) Received: (qmail 60898 invoked by uid 500); 14 Feb 2012 01:19:14 -0000 Delivered-To: apmail-couchdb-user-archive@couchdb.apache.org Received: (qmail 60820 invoked by uid 500); 14 Feb 2012 01:19:13 -0000 Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@couchdb.apache.org Delivered-To: mailing list user@couchdb.apache.org Received: (qmail 60812 invoked by uid 99); 14 Feb 2012 01:19:13 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 14 Feb 2012 01:19:13 +0000 X-ASF-Spam-Status: No, hits=2.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_SOFTFAIL,WEIRD_PORT X-Spam-Check-By: apache.org Received-SPF: softfail (athena.apache.org: transitioning domain of mcastonguay@justlexit.com does not designate 209.85.213.180 as permitted sender) Received: from [209.85.213.180] (HELO mail-yx0-f180.google.com) (209.85.213.180) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 14 Feb 2012 01:19:07 +0000 Received: by yenr11 with SMTP id r11so3464403yen.11 for ; Mon, 13 Feb 2012 17:18:46 -0800 (PST) MIME-Version: 1.0 Received: by 10.236.9.1 with SMTP id 1mr23555372yhs.14.1329182325846; Mon, 13 Feb 2012 17:18:45 -0800 (PST) Received: by 10.146.241.13 with HTTP; Mon, 13 Feb 2012 17:18:45 -0800 (PST) X-Originating-IP: [173.176.165.206] In-Reply-To: <9AD36519-7817-4DE8-9E8F-6A941AF02EA8@cloudant.com> References: <4643A725-CF8E-4726-85D2-E452A671B2AB@sri.com> <5CF88DBD-17CA-4E89-A5D2-AAC73D02C57C@sri.com> <507B387585EF49CAA32E84AC9598A6EE@googlemail.com> <9AD36519-7817-4DE8-9E8F-6A941AF02EA8@cloudant.com> Date: Mon, 13 Feb 2012 20:18:45 -0500 Message-ID: Subject: Re: Question about multiple keys with ranges From: Mathieu Castonguay To: user@couchdb.apache.org Content-Type: multipart/alternative; boundary=20cf303a2bc15fb20804b8e26011 X-Gm-Message-State: ALoCoQnQZJxLPf88rMkLN9l42eScU1oGHFZ/94HjNhSf9+Cs6m2STRHlerwu2j0AjlRG+fcBc6TF --20cf303a2bc15fb20804b8e26011 Content-Type: text/plain; charset=ISO-8859-1 Yes, it was me that misunderstood your example, I've been trying a few things now and it's working great, thank you for your help. On Mon, Feb 13, 2012 at 7:46 PM, Michael Miller wrote: > Thanks Simon, > > Mathieu I'm afraid that I may have misunderstood what you're trying to do. > I assumed the timestamp was a static property of the document. The role > of the map function is to render those static properties into a static > index, and then to use dynamic start/stop keys at query time to to range > queries. It's a common misperception to thing that you are interacting > with the map function at query time, but you aren't. You can only interact > with the output of the map function, so you want to put the logic into the > startkey/endky to slice into the index appropriately. Are we on the right > track? > > -M > > On Feb 13, 2012, at 4:33 PM, Simon Metson wrote: > > > Hi, > > Do you mean how do you query the view for a given date? Once the > document is written I'd assume it has a fixed date, e.g. you'd do something > like: > >> var d = new Date(Date.parse(doc.date)); > >> > >> > > > > > > You don't want to dynamically generate the date in the view, as this > will be the date the view was created, not the date of the query or the > date associated to the data. > > Cheers > > Simon > > > > > > On Monday, 13 February 2012 at 21:27, Mathieu Castonguay wrote: > > > >> Thanks for the explanation Michael. This works great if you already know > >> the value of the date, but if it's dynamic, how can I replace this line > >> > >> var d = new Date(Date.parse("2012-02-11T22:00:00")) > >> > >> with the date from the key? Can I access key[0] or something along those > >> lines from inside my map function? > >> > >> On Mon, Feb 13, 2012 at 3:46 PM, Michael Miller mike@cloudant.com)> wrote: > >> > >>> Hi Mathieu, > >>> > >>> Sorry to jump in on this conversation late. This is a bit verbose, but > >>> I've seen this question go by unanswered way too many times and > decided to > >>> be proactive. > >>> > >>> *Long story short: CouchDB is ideal for this, even on big data sets. It > >>> will be fast at scale. > >>> > >>> * Details: When working with dates in couchdb, I almost always find > >>> myself using the following pattern: > >>> > >>> 1) Store the date-time in either epoch seconds or a ISO std format, > both > >>> of which are convenient to work with in javascript (for server-side > views > >>> as well as client applications). Your choice of ISO 8601 formatted > works > >>> nicely with the JS Date class: > >>> > >>> var d = new Date(Date.parse("2012-02-11T22:00:00")) > >>> > >>> 2) Then, in preparation for future reduces you will likely end up > wanting, > >>> I'd use a compound key structured like: > >>> [, year, month, day] > >>> > >>> So, the map code would be: > >>> > >>> function(doc){ > >>> if (doc && doc.userId && doc.timeScheduled && doc.dollarValue) { > >>> var d = new Date(Date.parse("2012-02-11T22:00:00")); > >>> //note, Month runs [0,11] > >>> emit( [doc.userId, d.getFullYear(), d.getMonth(), d.getDate()], > >>> doc.dollarValue); > >>> } > >>> } > >>> > >>> where I've assumed that you may want to aggregate on some fictitious > >>> doc.dollarValue numerical field. For that, you would add to your design > >>> document a builtin reduce function: > >>> > >>> "reduce": "_stats" > >>> > >>> to get the count, sum, min value, max value, mean and std-dev. Let's > >>> suppose we call this view "idByTime" and it lives in the design_doc > called > >>> "selectors". > >>> > >>> 3) Now, to query this for the SELECT you want you would do: > >>> > >>> curl -X GET ' > >>> > http://demo.cloudant.com/dbname/_design/sectors/_view/idByTime?reduce=false&startkey=\[ > >>> "bob",2012,0,1\]&endkey=\["bob",2012,0,25\]' > >>> > >>> to get the list of document ids that fall within Jan 1, 2012 and Jan > 25, > >>> 2012 for user id "bob". > >>> > >>> Now, if you want to get the full documents, you can just change that > to: > >>> > >>> curl -X GET ' > >>> > http://demo.cloudant.com/dbname/_design/sectors/_view/idByTime?reduce=false&startkey=\[ > >>> "bob",2012,0,1\]&endkey=\["bob",2012,0,25\]&include_docs=true' > >>> > >>> 4) Now, the real fun comes when you can use that same index to do > >>> query-time rollup that's super fast. For this the thing you want to > note > >>> is the group_level option at query time. If you have a key of 'n' > >>> dimensions (n=4 in our case), then you can roll it up from > dimensionality > >>> n=0 through n=4. So, at full dimensionality: > >>> > >>> curl -X GET ' > >>> > http://demo.cloudant.com/dbname/_design/sectors/_view/idByTime?group_level=4 > >>> ' > >>> > >>> will give you the values for all users aggregated by day. You can add > >>> startkey and endky just as before to slice into the range. > >>> > >>> Now if you want to roll it up by user/year/month: > >>> > >>> curl -X GET ' > >>> > http://demo.cloudant.com/dbname/_design/sectors/_view/idByTime?group_level=3 > >>> ' > >>> > >>> by user/year: > >>> > >>> curl -X GET ' > >>> > http://demo.cloudant.com/dbname/_design/sectors/_view/idByTime?group_level=2 > >>> ' > >>> > >>> by user: > >>> > >>> curl -X GET ' > >>> > http://demo.cloudant.com/dbname/_design/sectors/_view/idByTime?group_level=1 > >>> ' > >>> > >>> and ultimately roll up over all users: > >>> > >>> curl -X GET ' > >>> > http://demo.cloudant.com/dbname/_design/sectors/_view/idByTime?group_level=0 > >>> ' > >>> > >>> Note that group_level=0 => "group=false", and group_level = n => > >>> "group=true" in the view query options at: > >>> > >>> http://wiki.apache.org/couchdb/HTTP_view_API#Querying_Options. > >>> > >>> I prefer to just be explicit with the group_level and forget that > >>> group=true/false exists. > >>> > >>> Thanks, Mike > >>> > >>> p.s., apologies for any typos, I was cribbing this from some cloudant > >>> blog-posts in the making. > >>> > >>> > >>> > >>> On Feb 13, 2012, at 11:11 AM, Mathieu Castonguay wrote: > >>> > >>>> I tried that exact example with > >>> > ?startKey=["26de9c438e5d1c0f075f2ae6ad0bcc82","2012-02-11T22:00:00"]&endkey=["26de9c438e5d1c0f075f2ae6ad0bcc82",{}] > >>>> and I still get records in the past: > >>>> > >>>> {"total_rows":3,"offset":0,"rows":[ > >>> > {"id":"344e921af796598bcd709ba973003c60","key":["26de9c438e5d1c0f075f2ae6ad0b39b2","2012-02-13T16:18:19.565+0000"],"value":"344e921af796598bcd709ba973003c60"}, > >>>> > >>> > >>> > {"id":"344e921af796598bcd709ba973001d3f","key":["26de9c438e5d1c0f075f2ae6ad0bcc82","2012-02-10T21:44:14.920+0000"],"value":"344e921af796598bcd709ba973001d3f"}, > >>>> > >>> > >>> > {"id":"344e921af796598bcd709ba973002c01","key":["26de9c438e5d1c0f075f2ae6ad0bcc82","2012-02-10T22:05:48.218+0000"],"value":"344e921af796598bcd709ba973002c01"} > >>>> ]} > >>>> > >>>> > >>>> The view's map function is: > >>>> > >>>> function(doc) { if(doc.userId && doc.timeScheduled) > >>>> {emit([doc.userId,doc.timeScheduled], doc._id)} } > >>>> > >>>> > >>>> > >>>> > >>>> On Mon, Feb 13, 2012 at 1:55 PM, James Klo jim.klo@sri.com)> wrote: > >>>> > >>>>> Not sure how you are querying, but are you doing the equivalent to > this? > >>>>> startkey and endkey should be expressed as JSON > >>>>> > >>>>> curl -g ' > >>> > http://localhost:5984/orders/_design/Order/_view/by_users_after_time?startkey=[ > >>>>> > >>>> > >>> > >>> > "f98ba9a518650a6c15c566fc6f00c157","2012-01-01T11:40:52.280Z"]&endkey=["userid",{}]' > >>>>> > >>>>> > >>>>> * > >>>>> Jim Klo > >>>>> Senior Software Engineer > >>>>> Center for Software Engineering > >>>>> SRI International > >>>>> e. jim.klo@sri.com (mailto:jim.klo@sri.com) > >>>>> p. 805.542.9330 x121 > >>>>> m. 805.286.1350 > >>>>> f. 805.546.2444 > >>>>> * > >>>>> > >>>>> On Feb 13, 2012, at 10:27 AM, Mathieu Castonguay wrote: > >>>>> > >>>>> I tried reversing the keys with no luck. I still get timestamps that > >>> are in > >>>>> the past (before the startKey). > >>>>> > >>>>> On Sat, Feb 11, 2012 at 6:37 PM, James Klo jim.klo@sri.com)> wrote: > >>>>> > >>>>> Reverse the key. [userid, time] > >>>>> > >>>>> > >>>>> CouchDB is all about understanding collation. Basically views are > >>>>> > >>>>> sorted/grouped from left to right alphanumeric. See > >>>>> > >>>>> http://wiki.apache.org/couchdb/View_collation for the finer details > as > >>>>> > >>>>> there are more rules than the basics I mention. > >>>>> > >>>>> > >>>>> so the reversal sorts the view by userid first, then date as string. > >>>>> > >>>>> Instead of sorting by dates then userids. > >>>>> > >>>>> > >>>>> You do it this way because you know the exact userid, but not the > exact > >>>>> > >>>>> date. If you knew the exact date, but not the userid, what you have > >>>>> > >>>>> currently would be better. > >>>>> > >>>>> > >>>>> - Jim > >>>>> > >>>>> > >>>>> > >>>>> Sent from my iPad > >>>>> > >>>>> > >>>>> On Feb 11, 2012, at 1:54 PM, "Mathieu Castonguay" < > >>>>> > >>>>> mcastonguay@justlexit.com (mailto:mcastonguay@justlexit.com)> wrote: > >>>>> > >>>>> > >>>>> I have a simple document named Order structure with the fields id, > name, > >>>>> > >>>>> userId and timeScheduled. > >>>>> > >>>>> > >>>>> What I would like to do is create a view where I can find the > >>>>> > >>>>> document.idfor those who's userId is some value and timeScheduledis > >>>>> > >>>>> after a given date. > >>>>> > >>>>> > >>>>> My view: > >>>>> > >>>>> > >>>>> "by_users_after_time": { > >>>>> > >>>>> "map": "function(doc) { if (doc.userId && doc.timeScheduled) { > >>>>> > >>>>> emit([doc.timeScheduled, doc.userId], doc._id); }}" > >>>>> > >>>>> } > >>>>> > >>>>> > >>>>> If I do > >>> > localhost:5984/orders/_design/Order/_view/by_users_after_time?startKey="[2012-01-01T11:40:52.280Z,f98ba9a518650a6c15c566fc6f00c157]" > >>>>> > >>>>> I get every result back. Is there a way to access key[1] to do an if > >>>>> > >>>>> doc.userId == key[1] or something along those lines and simply emit > on > >>>>> > >>>>> the > >>>>> > >>>>> time? > >>>>> > >>>>> > >>>>> This would be the SQL equivalent of select id from Order where > userId = > >>>>> > >>>>> "f98ba9a518650a6c15c566fc6f00c157" and timeScheduled > > >>>>> > >>>>> 2012-01-01T11:40:52.280Z; > >>>>> > >>>>> > >>>>> I did quite a few Google searches but I can't seem to find a good > >>>>> > >>>>> tutorial > >>>>> > >>>>> on working with multiple keys. It's also possible that my approach is > >>>>> > >>>>> entirely flawed so any guidance would be appreciated. > >>>>> > >>>>> > >>>>> Thank you, > >>>>> > >>>>> > >>>>> Matt > > > > --20cf303a2bc15fb20804b8e26011--