Return-Path: Delivered-To: apmail-couchdb-user-archive@www.apache.org Received: (qmail 58457 invoked from network); 18 Mar 2011 14:15:04 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 18 Mar 2011 14:15:04 -0000 Received: (qmail 83666 invoked by uid 500); 18 Mar 2011 14:15:03 -0000 Delivered-To: apmail-couchdb-user-archive@couchdb.apache.org Received: (qmail 83642 invoked by uid 500); 18 Mar 2011 14:15:03 -0000 Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@couchdb.apache.org Delivered-To: mailing list user@couchdb.apache.org Received: (qmail 83634 invoked by uid 99); 18 Mar 2011 14:15:03 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 18 Mar 2011 14:15:03 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of zachary.zolton@gmail.com designates 74.125.82.54 as permitted sender) Received: from [74.125.82.54] (HELO mail-ww0-f54.google.com) (74.125.82.54) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 18 Mar 2011 14:14:58 +0000 Received: by wwd20 with SMTP id 20so4493496wwd.23 for ; Fri, 18 Mar 2011 07:14:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc:content-type:content-transfer-encoding; bh=lQTxgaUqHd5H4Q7eBe7ErMnPkVwhhyHkXwOS9jJxDeM=; b=GC0yJiLJlb1v+e+ImrcQIyaVvn9YY1MOBQ7kWKZnFmwKUER96ZlEoTitfFLTWVRaj8 X8/JEIAwonfcGtsxD1kZCR6Tnj4ut3steJyLYBVgYk7ZJIw7o4wgBaDWGVmTdEtw7Bb3 WCe6vfS0t7JHwxMCBS8GJ5OOvnjievj7AUd4g= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type:content-transfer-encoding; b=JwEeZkPW4sT/r3aSSgtI7ruvO3nj5+Doowt7Pyqt2xemPLAE5YsFK0I4vLhbnG6uXR vM6+P0rdILigGA2qIBPUfdvzKArPbnLfoBfvkgEXZLc13BTnx7ZwYw/wBy8z3zX7Cn+s gpCFx6KZXZVCCgJK7m15DRa10cSLzFYgxHDV8= Received: by 10.227.130.130 with SMTP id t2mr1332761wbs.7.1300457676624; Fri, 18 Mar 2011 07:14:36 -0700 (PDT) MIME-Version: 1.0 Received: by 10.227.134.209 with HTTP; Fri, 18 Mar 2011 07:14:10 -0700 (PDT) In-Reply-To: References: From: Zachary Zolton Date: Fri, 18 Mar 2011 09:14:10 -0500 Message-ID: Subject: Re: Paging large result sets with sorting To: Justin Walgran Cc: user@couchdb.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable I've made good use of CouchDB-Lucene in the past, but haven't had a chance to play around with ElasticSearch. Another alternative would be to schedule a background process to create a summary document for each month's data. On Fri, Mar 18, 2011 at 8:41 AM, Justin Walgran wrote= : > Thanks for the suggestion, Zach. The problem I'm running into is that > there are too many results to sort quickly in a =A0list function or on > the client. > > It is looking more and more like hooking up some flavor of Lucene may > be the only way to solve this problem. > > Does anyone have recommendations on using ElasticSearch vs. CouchDB-Lucen= e? > > Justin > > On Thu, Mar 17, 2011 at 5:23 PM, Zachary Zolton > wrote: >> Justin, >> >> Depending on your intended usage, it may be acceptable to just use the >> view to filter by the desired month and then perform your sort in >> client-side code. Alternatively, you could do the sorting server-side >> in a _list function, but this may put quite a burden on your CouchDB >> server if you're making a high volume of these queries. >> >> Also, CouchDB-Lucene is very capable of querying ranges in one field >> while sorting on an additional field. >> >> >> Cheers, >> >> Zach >> >> On Thu, Mar 17, 2011 at 3:34 PM, Justin Walgran wr= ote: >>> I'm sorry, I oversimplified my problem statement. Your solution is >>> correct if I only need to select by month. Unfortunately I also need >>> to support an arbitrary inspection date range for filtering results. >>> February 6th to march 14th for example. This is where the trouble >>> creeps in. >>> >>> Justin >>> >>> On Thu, Mar 17, 2011 at 4:29 PM, Keith Gable wrote: >>>> Then simply emit the name before the day of the month. Then, it'll >>>> sort by name then day of month. >>>> >>>> On Thu, Mar 17, 2011 at 3:17 PM, Justin Walgran = wrote: >>>>> Thanks for the thoughtful reply, Keith. >>>>> >>>>> Assume these input docs: >>>>> >>>>> =A0{ "inspection_date": "2011-03-01", "homeowner_name": "Bob" } >>>>> >>>>> =A0{ "inspection_date": "2011-03-02", "homeowner_name": "Keith" } >>>>> >>>>> =A0{ "inspection_date": "2011-03-03", "homeowner_name": "Alice" } >>>>> >>>>> The key output from >>>>> by_inspection_date_and_homeowner_name?reduce=3Dfalse&startkey=3D[2011= ,3,0]&endkey=3D[2011,3,{}] >>>>> would be: >>>>> >>>>> =A0[2011,3,1,"Bob"] >>>>> =A0[2011,3,2,"Keith"] >>>>> =A0[2011,3,3,"Alice"] >>>>> >>>>> Which is not sorted by home owner name. That's the gotcha. >>>>> >>>>> >>>>> Justin >>>>> >>>>> On Thu, Mar 17, 2011 at 2:13 PM, Keith Gable wrote: >>>>>> Uh. This sounds simple? >>>>>> >>>>>> view: by_home_owner_name: >>>>>> if (doc.home_owner_name) { emit(doc.home_owner_name, 1); } >>>>>> >>>>>> view: by_inspection_date: >>>>>> if (doc.inspection_date) { >>>>>> var d =3D new Date(doc.inspection_date); >>>>>> emit ([ d.getFullYear(), d.getMonth() + 1, d.getDate() ], 1); >>>>>> } >>>>>> >>>>>> To look for all of my inspections: >>>>>> ...by_home_owner_name?key=3DKeith Gable >>>>>> >>>>>> To get all of the inspections for today: >>>>>> ...by_inspection_date?reduce=3Dfalse&key=3D[2011,3,17] >>>>>> >>>>>> To get all of the inspections for this month: >>>>>> ...by_inspection_date?reduce=3Dfalse&startkey=3D[2011,3,0]&endkey=3D= [2011,3,{}] >>>>>> >>>>>> >>>>>> Combining the two: >>>>>> >>>>>> view: by_inspection_date_and_homeowner_name: >>>>>> if (doc.inspection_date && doc.homeowner_name) { >>>>>> var d =3D new Date(doc.inspection_date); >>>>>> emit ([ d.getFullYear(), d.getMonth() + 1, d.getDate(), >>>>>> doc.homeowner_name ], 1); >>>>>> } >>>>>> >>>>>> ...by_inspection_date_and_homeowner_name?reduce=3Dfalse&startkey=3D[= 2011,3,0]&endkey=3D[2011,3,{}] >>>>>> >>>>>> Will result in: >>>>>> [2011,3,1,"Alice"] >>>>>> [2011,3,1,"Bob"] >>>>>> [2011,3,2,"Keith"] >>>>>> >>>>>> >>>>>> Does any of that not do what you want? >>>>>> >>>>>> On Thu, Mar 17, 2011 at 12:33 PM, Justin Walgran wrote: >>>>>>> Assume a CouchDB storing and indexing housing inspection records. E= ach >>>>>>> inspection document as two important fields. >>>>>>> >>>>>>> =A0- Home owner name >>>>>>> =A0- Inspection date >>>>>>> >>>>>>> There are about 15,000 inspection documents generated per month. >>>>>>> >>>>>>> I need to quickly retrieve a list of inspections for January, sorte= d >>>>>>> by home owner name. >>>>>>> >>>>>>> The issue I am running into is the fact that the size of the result >>>>>>> set requires paging the data using limit and startkey. This would >>>>>>> required that the view key be the inspection date, which means the >>>>>>> results cannot be sorted by home owner name. The size of the data >>>>>>> means that pulling it all down to the client and sorting in the >>>>>>> browser is not performant. >>>>>>> >>>>>>> Is there a clever way to solve this problem? >>>>>>> >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> Justin >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Keith Gable >>>>>> A+ Certified Professional >>>>>> Network+ Certified Professional >>>>>> Web Developer >>>>>> >>>>> >>>> >>>> >>>> >>>> -- >>>> Keith Gable >>>> A+ Certified Professional >>>> Network+ Certified Professional >>>> Web Developer >>>> >>> >> >