incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Newson <rnew...@apache.org>
Subject Re: Question about multiple keys with ranges
Date Tue, 14 Feb 2012 11:17:42 GMT
it's "startkey" not "startKey"

On 14 February 2012 01:54, Mathieu Castonguay <mcastonguay@justlexit.com> wrote:
> Actually disregard that, it's still not working... :(
>
> The view:
>
> function(doc) { if(doc.userId && doc.timeScheduled) {var d = new
> Date(Date.parse(doc.timeScheduled)); emit([doc.userId,
> d.getFullYear(), d.getMonth(), d.getDate()], doc._id)} }
>
>
>
> When I do ?startKey=["226de9c438e5d1c0f075f2ae6ad0bcc82",2012,1,11]
>
> I get these results, which seems to get null for those values.
>
> {"id":"344e921af796598bcd709ba973003c60","key":["26de9c438e5d1c0f075f2ae6ad0b39b2",null,null,null],"value":"344e921af796598bcd709ba973003c60"},
> {"id":"344e921af796598bcd709ba973004cd9","key":["26de9c438e5d1c0f075f2ae6ad0b39b2",null,null,null],"value":"344e921af796598bcd709ba973004cd9"},
> {"id":"344e921af796598bcd709ba973001d3f","key":["26de9c438e5d1c0f075f2ae6ad0bcc82",null,null,null],"value":"344e921af796598bcd709ba973001d3f"},
> {"id":"344e921af796598bcd709ba973002c01","key":["26de9c438e5d1c0f075f2ae6ad0bcc82",null,null,null],"value":"344e921af796598bcd709ba973002c01"}
>
> If I do the full thing with the end key:
> ?startKey=["226de9c438e5d1c0f075f2ae6ad0bcc82",2012,1,11]&endkey=["226de9c438e5d1c0f075f2ae6ad0bcc82",2012,3,25]
>
> I get no results:
>
> {"total_rows":4,"offset":0,"rows":[]}
>
>
> On Mon, Feb 13, 2012 at 8:18 PM, Mathieu Castonguay <
> mcastonguay@justlexit.com> wrote:
>
>> Yes, it was me that misunderstood your example, I've been trying a few
>> things now and it's working great, thank you for your help.
>>
>>
>> On Mon, Feb 13, 2012 at 7:46 PM, Michael Miller <mike@cloudant.com> wrote:
>>
>>> Thanks Simon,
>>>
>>> Mathieu I'm afraid that I may have misunderstood what you're trying to
>>> do.  I assumed the timestamp was a static property of the document.  The
>>> role of the map function is to render those static properties into a static
>>> index, and then to use dynamic start/stop keys at query time to to range
>>> queries.   It's a common misperception to thing that you are interacting
>>> with the map function at query time, but you aren't.  You can only interact
>>> with the output of the map function, so you want to put the logic into the
>>> startkey/endky to slice into the index appropriately.  Are we on the right
>>> track?
>>>
>>> -M
>>>
>>> On Feb 13, 2012, at 4:33 PM, Simon Metson wrote:
>>>
>>> > Hi,
>>> > Do you mean how do you query the view for a given date? Once the
>>> document is written I'd assume it has a fixed date, e.g. you'd do something
>>> like:
>>> >> var d = new Date(Date.parse(doc.date));
>>> >>
>>> >>
>>> >
>>> >
>>> > You don't want to dynamically generate the date in the view, as this
>>> will be the date the view was created, not the date of the query or the
>>> date associated to the data.
>>> > Cheers
>>> > Simon
>>> >
>>> >
>>> > On Monday, 13 February 2012 at 21:27, Mathieu Castonguay wrote:
>>> >
>>> >> Thanks for the explanation Michael. This works great if you already
>>> know
>>> >> the value of the date, but if it's dynamic, how can I replace this line
>>> >>
>>> >> var d = new Date(Date.parse("2012-02-11T22:00:00"))
>>> >>
>>> >> with the date from the key? Can I access key[0] or something along
>>> those
>>> >> lines from inside my map function?
>>> >>
>>> >> On Mon, Feb 13, 2012 at 3:46 PM, Michael Miller <mike@cloudant.com(mailto:
>>> mike@cloudant.com)> wrote:
>>> >>
>>> >>> Hi Mathieu,
>>> >>>
>>> >>> Sorry to jump in on this conversation late. This is a bit verbose,
but
>>> >>> I've seen this question go by unanswered way too many times and
>>> decided to
>>> >>> be proactive.
>>> >>>
>>> >>> *Long story short: CouchDB is ideal for this, even on big data sets.
>>> It
>>> >>> will be fast at scale.
>>> >>>
>>> >>> * Details: When working with dates in couchdb, I almost always find
>>> >>> myself using the following pattern:
>>> >>>
>>> >>> 1) Store the date-time in either epoch seconds or a ISO std format,
>>> both
>>> >>> of which are convenient to work with in javascript (for server-side
>>> views
>>> >>> as well as client applications). Your choice of ISO 8601 formatted
>>> works
>>> >>> nicely with the JS Date class:
>>> >>>
>>> >>> var d = new Date(Date.parse("2012-02-11T22:00:00"))
>>> >>>
>>> >>> 2) Then, in preparation for future reduces you will likely end up
>>> wanting,
>>> >>> I'd use a compound key structured like:
>>> >>> [<userId>, year, month, day]
>>> >>>
>>> >>> So, the map code would be:
>>> >>>
>>> >>> function(doc){
>>> >>> if (doc && doc.userId && doc.timeScheduled &&
doc.dollarValue) {
>>> >>> var d = new Date(Date.parse("2012-02-11T22:00:00"));
>>> >>> //note, Month runs [0,11]
>>> >>> emit( [doc.userId, d.getFullYear(), d.getMonth(), d.getDate()],
>>> >>> doc.dollarValue);
>>> >>> }
>>> >>> }
>>> >>>
>>> >>> where I've assumed that you may want to aggregate on some fictitious
>>> >>> doc.dollarValue numerical field. For that, you would add to your
>>> design
>>> >>> document a builtin reduce function:
>>> >>>
>>> >>> "reduce": "_stats"
>>> >>>
>>> >>> to get the count, sum, min value, max value, mean and std-dev. Let's
>>> >>> suppose we call this view "idByTime" and it lives in the design_doc
>>> called
>>> >>> "selectors".
>>> >>>
>>> >>> 3) Now, to query this for the SELECT you want you would do:
>>> >>>
>>> >>> curl -X GET '
>>> >>>
>>> http://demo.cloudant.com/dbname/_design/sectors/_view/idByTime?reduce=false&startkey=\[
>>> >>> "bob",2012,0,1\]&endkey=\["bob",2012,0,25\]'
>>> >>>
>>> >>> to get the list of document ids that fall within Jan 1, 2012 and
Jan
>>> 25,
>>> >>> 2012 for user id "bob".
>>> >>>
>>> >>> Now, if you want to get the full documents, you can just change
that
>>> to:
>>> >>>
>>> >>> curl -X GET '
>>> >>>
>>> http://demo.cloudant.com/dbname/_design/sectors/_view/idByTime?reduce=false&startkey=\[
>>> >>> "bob",2012,0,1\]&endkey=\["bob",2012,0,25\]&include_docs=true'
>>> >>>
>>> >>> 4) Now, the real fun comes when you can use that same index to do
>>> >>> query-time rollup that's super fast. For this the thing you want
to
>>> note
>>> >>> is the group_level option at query time. If you have a key of 'n'
>>> >>> dimensions (n=4 in our case), then you can roll it up from
>>> dimensionality
>>> >>> n=0 through n=4. So, at full dimensionality:
>>> >>>
>>> >>> curl -X GET '
>>> >>>
>>> http://demo.cloudant.com/dbname/_design/sectors/_view/idByTime?group_level=4
>>> >>> '
>>> >>>
>>> >>> will give you the values for all users aggregated by day. You can
add
>>> >>> startkey and endky just as before to slice into the range.
>>> >>>
>>> >>> Now if you want to roll it up by user/year/month:
>>> >>>
>>> >>> curl -X GET '
>>> >>>
>>> http://demo.cloudant.com/dbname/_design/sectors/_view/idByTime?group_level=3
>>> >>> '
>>> >>>
>>> >>> by user/year:
>>> >>>
>>> >>> curl -X GET '
>>> >>>
>>> http://demo.cloudant.com/dbname/_design/sectors/_view/idByTime?group_level=2
>>> >>> '
>>> >>>
>>> >>> by user:
>>> >>>
>>> >>> curl -X GET '
>>> >>>
>>> http://demo.cloudant.com/dbname/_design/sectors/_view/idByTime?group_level=1
>>> >>> '
>>> >>>
>>> >>> and ultimately roll up over all users:
>>> >>>
>>> >>> curl -X GET '
>>> >>>
>>> http://demo.cloudant.com/dbname/_design/sectors/_view/idByTime?group_level=0
>>> >>> '
>>> >>>
>>> >>> Note that group_level=0 => "group=false", and group_level = n
=>
>>> >>> "group=true" in the view query options at:
>>> >>>
>>> >>> http://wiki.apache.org/couchdb/HTTP_view_API#Querying_Options.
>>> >>>
>>> >>> I prefer to just be explicit with the group_level and forget that
>>> >>> group=true/false exists.
>>> >>>
>>> >>> Thanks, Mike
>>> >>>
>>> >>> p.s., apologies for any typos, I was cribbing this from some cloudant
>>> >>> blog-posts in the making.
>>> >>>
>>> >>>
>>> >>>
>>> >>> On Feb 13, 2012, at 11:11 AM, Mathieu Castonguay wrote:
>>> >>>
>>> >>>> I tried that exact example with
>>> >>>
>>> ?startKey=["26de9c438e5d1c0f075f2ae6ad0bcc82","2012-02-11T22:00:00"]&endkey=["26de9c438e5d1c0f075f2ae6ad0bcc82",{}]
>>> >>>> and I still get records in the past:
>>> >>>>
>>> >>>> {"total_rows":3,"offset":0,"rows":[
>>> >>>
>>> {"id":"344e921af796598bcd709ba973003c60","key":["26de9c438e5d1c0f075f2ae6ad0b39b2","2012-02-13T16:18:19.565+0000"],"value":"344e921af796598bcd709ba973003c60"},
>>> >>>>
>>> >>>
>>> >>>
>>> {"id":"344e921af796598bcd709ba973001d3f","key":["26de9c438e5d1c0f075f2ae6ad0bcc82","2012-02-10T21:44:14.920+0000"],"value":"344e921af796598bcd709ba973001d3f"},
>>> >>>>
>>> >>>
>>> >>>
>>> {"id":"344e921af796598bcd709ba973002c01","key":["26de9c438e5d1c0f075f2ae6ad0bcc82","2012-02-10T22:05:48.218+0000"],"value":"344e921af796598bcd709ba973002c01"}
>>> >>>> ]}
>>> >>>>
>>> >>>>
>>> >>>> The view's map function is:
>>> >>>>
>>> >>>> function(doc) { if(doc.userId && doc.timeScheduled)
>>> >>>> {emit([doc.userId,doc.timeScheduled], doc._id)} }
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>> On Mon, Feb 13, 2012 at 1:55 PM, James Klo <jim.klo@sri.com
(mailto:
>>> jim.klo@sri.com)> wrote:
>>> >>>>
>>> >>>>> Not sure how you are querying, but are you doing the equivalent
to
>>> this?
>>> >>>>> startkey and endkey should be expressed as JSON
>>> >>>>>
>>> >>>>> curl -g '
>>> >>>
>>> http://localhost:5984/orders/_design/Order/_view/by_users_after_time?startkey=[
>>> >>>>>
>>> >>>>
>>> >>>
>>> >>>
>>> "f98ba9a518650a6c15c566fc6f00c157","2012-01-01T11:40:52.280Z"]&endkey=["userid",{}]'
>>> >>>>>
>>> >>>>>
>>> >>>>> *
>>> >>>>> Jim Klo
>>> >>>>> Senior Software Engineer
>>> >>>>> Center for Software Engineering
>>> >>>>> SRI International
>>> >>>>> e. jim.klo@sri.com (mailto:jim.klo@sri.com)
>>> >>>>> p. 805.542.9330 x121
>>> >>>>> m. 805.286.1350
>>> >>>>> f. 805.546.2444
>>> >>>>> *
>>> >>>>>
>>> >>>>> On Feb 13, 2012, at 10:27 AM, Mathieu Castonguay wrote:
>>> >>>>>
>>> >>>>> I tried reversing the keys with no luck. I still get timestamps
that
>>> >>> are in
>>> >>>>> the past (before the startKey).
>>> >>>>>
>>> >>>>> On Sat, Feb 11, 2012 at 6:37 PM, James Klo <jim.klo@sri.com(mailto:
>>> jim.klo@sri.com)> wrote:
>>> >>>>>
>>> >>>>> Reverse the key. [userid, time]
>>> >>>>>
>>> >>>>>
>>> >>>>> CouchDB is all about understanding collation. Basically
views are
>>> >>>>>
>>> >>>>> sorted/grouped from left to right alphanumeric. See
>>> >>>>>
>>> >>>>> http://wiki.apache.org/couchdb/View_collation for the finer
>>> details as
>>> >>>>>
>>> >>>>> there are more rules than the basics I mention.
>>> >>>>>
>>> >>>>>
>>> >>>>> so the reversal sorts the view by userid first, then date
as string.
>>> >>>>>
>>> >>>>> Instead of sorting by dates then userids.
>>> >>>>>
>>> >>>>>
>>> >>>>> You do it this way because you know the exact userid, but
not the
>>> exact
>>> >>>>>
>>> >>>>> date. If you knew the exact date, but not the userid, what
you have
>>> >>>>>
>>> >>>>> currently would be better.
>>> >>>>>
>>> >>>>>
>>> >>>>> - Jim
>>> >>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>> Sent from my iPad
>>> >>>>>
>>> >>>>>
>>> >>>>> On Feb 11, 2012, at 1:54 PM, "Mathieu Castonguay" <
>>> >>>>>
>>> >>>>> mcastonguay@justlexit.com (mailto:mcastonguay@justlexit.com)>
>>> wrote:
>>> >>>>>
>>> >>>>>
>>> >>>>> I have a simple document named Order structure with the
fields id,
>>> name,
>>> >>>>>
>>> >>>>> userId and timeScheduled.
>>> >>>>>
>>> >>>>>
>>> >>>>> What I would like to do is create a view where I can find
the
>>> >>>>>
>>> >>>>> document.idfor those who's userId is some value and timeScheduledis
>>> >>>>>
>>> >>>>> after a given date.
>>> >>>>>
>>> >>>>>
>>> >>>>> My view:
>>> >>>>>
>>> >>>>>
>>> >>>>> "by_users_after_time": {
>>> >>>>>
>>> >>>>> "map": "function(doc) { if (doc.userId && doc.timeScheduled)
{
>>> >>>>>
>>> >>>>> emit([doc.timeScheduled, doc.userId], doc._id); }}"
>>> >>>>>
>>> >>>>> }
>>> >>>>>
>>> >>>>>
>>> >>>>> If I do
>>> >>>
>>> localhost:5984/orders/_design/Order/_view/by_users_after_time?startKey="[2012-01-01T11:40:52.280Z,f98ba9a518650a6c15c566fc6f00c157]"
>>> >>>>>
>>> >>>>> I get every result back. Is there a way to access key[1]
to do an if
>>> >>>>>
>>> >>>>> doc.userId == key[1] or something along those lines and
simply emit
>>> on
>>> >>>>>
>>> >>>>> the
>>> >>>>>
>>> >>>>> time?
>>> >>>>>
>>> >>>>>
>>> >>>>> This would be the SQL equivalent of select id from Order
where
>>> userId =
>>> >>>>>
>>> >>>>> "f98ba9a518650a6c15c566fc6f00c157" and timeScheduled >
>>> >>>>>
>>> >>>>> 2012-01-01T11:40:52.280Z;
>>> >>>>>
>>> >>>>>
>>> >>>>> I did quite a few Google searches but I can't seem to find
a good
>>> >>>>>
>>> >>>>> tutorial
>>> >>>>>
>>> >>>>> on working with multiple keys. It's also possible that my
approach
>>> is
>>> >>>>>
>>> >>>>> entirely flawed so any guidance would be appreciated.
>>> >>>>>
>>> >>>>>
>>> >>>>> Thank you,
>>> >>>>>
>>> >>>>>
>>> >>>>> Matt
>>> >
>>>
>>>
>>

Mime
View raw message