couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Zachary Zolton <zachary.zol...@gmail.com>
Subject Re: Paging large result sets with sorting
Date Fri, 18 Mar 2011 14:33:00 GMT
Yet another alternative would be to do the sort in client side code
and then cache the result, using something like memcached. Or do the
sort in a _list function, and have an HTTP proxy (e.g. Squid, Varnish)
cache the results.

The idea is that you don't wanna have to do all this work for each
request, and there's a number of ways to tackle it... (^_^)

On Fri, Mar 18, 2011 at 9:14 AM, Zachary Zolton
<zachary.zolton@gmail.com> wrote:
> I've made good use of CouchDB-Lucene in the past, but haven't had a
> chance to play around with ElasticSearch.
>
> Another alternative would be to schedule a background process to
> create a summary document for each month's data.
>
> On Fri, Mar 18, 2011 at 8:41 AM, Justin Walgran <jwalgran@azavea.com> wrote:
>> Thanks for the suggestion, Zach. The problem I'm running into is that
>> there are too many results to sort quickly in a  list function or on
>> the client.
>>
>> It is looking more and more like hooking up some flavor of Lucene may
>> be the only way to solve this problem.
>>
>> Does anyone have recommendations on using ElasticSearch vs. CouchDB-Lucene?
>>
>> Justin
>>
>> On Thu, Mar 17, 2011 at 5:23 PM, Zachary Zolton
>> <zachary.zolton@gmail.com> wrote:
>>> Justin,
>>>
>>> Depending on your intended usage, it may be acceptable to just use the
>>> view to filter by the desired month and then perform your sort in
>>> client-side code. Alternatively, you could do the sorting server-side
>>> in a _list function, but this may put quite a burden on your CouchDB
>>> server if you're making a high volume of these queries.
>>>
>>> Also, CouchDB-Lucene is very capable of querying ranges in one field
>>> while sorting on an additional field.
>>>
>>>
>>> Cheers,
>>>
>>> Zach
>>>
>>> On Thu, Mar 17, 2011 at 3:34 PM, Justin Walgran <jwalgran@azavea.com> wrote:
>>>> I'm sorry, I oversimplified my problem statement. Your solution is
>>>> correct if I only need to select by month. Unfortunately I also need
>>>> to support an arbitrary inspection date range for filtering results.
>>>> February 6th to march 14th for example. This is where the trouble
>>>> creeps in.
>>>>
>>>> Justin
>>>>
>>>> On Thu, Mar 17, 2011 at 4:29 PM, Keith Gable <ziggy@ignition-project.com>
wrote:
>>>>> Then simply emit the name before the day of the month. Then, it'll
>>>>> sort by name then day of month.
>>>>>
>>>>> On Thu, Mar 17, 2011 at 3:17 PM, Justin Walgran <jwalgran@azavea.com>
wrote:
>>>>>> Thanks for the thoughtful reply, Keith.
>>>>>>
>>>>>> Assume these input docs:
>>>>>>
>>>>>>  { "inspection_date": "2011-03-01", "homeowner_name": "Bob" }
>>>>>>
>>>>>>  { "inspection_date": "2011-03-02", "homeowner_name": "Keith" }
>>>>>>
>>>>>>  { "inspection_date": "2011-03-03", "homeowner_name": "Alice" }
>>>>>>
>>>>>> The key output from
>>>>>> by_inspection_date_and_homeowner_name?reduce=false&startkey=[2011,3,0]&endkey=[2011,3,{}]
>>>>>> would be:
>>>>>>
>>>>>>  [2011,3,1,"Bob"]
>>>>>>  [2011,3,2,"Keith"]
>>>>>>  [2011,3,3,"Alice"]
>>>>>>
>>>>>> Which is not sorted by home owner name. That's the gotcha.
>>>>>>
>>>>>>
>>>>>> Justin
>>>>>>
>>>>>> On Thu, Mar 17, 2011 at 2:13 PM, Keith Gable <ziggy@ignition-project.com>
wrote:
>>>>>>> Uh. This sounds simple?
>>>>>>>
>>>>>>> view: by_home_owner_name:
>>>>>>> if (doc.home_owner_name) { emit(doc.home_owner_name, 1); }
>>>>>>>
>>>>>>> view: by_inspection_date:
>>>>>>> if (doc.inspection_date) {
>>>>>>> var d = new Date(doc.inspection_date);
>>>>>>> emit ([ d.getFullYear(), d.getMonth() + 1, d.getDate() ], 1);
>>>>>>> }
>>>>>>>
>>>>>>> To look for all of my inspections:
>>>>>>> ...by_home_owner_name?key=Keith Gable
>>>>>>>
>>>>>>> To get all of the inspections for today:
>>>>>>> ...by_inspection_date?reduce=false&key=[2011,3,17]
>>>>>>>
>>>>>>> To get all of the inspections for this month:
>>>>>>> ...by_inspection_date?reduce=false&startkey=[2011,3,0]&endkey=[2011,3,{}]
>>>>>>>
>>>>>>>
>>>>>>> Combining the two:
>>>>>>>
>>>>>>> view: by_inspection_date_and_homeowner_name:
>>>>>>> if (doc.inspection_date && doc.homeowner_name) {
>>>>>>> var d = new Date(doc.inspection_date);
>>>>>>> emit ([ d.getFullYear(), d.getMonth() + 1, d.getDate(),
>>>>>>> doc.homeowner_name ], 1);
>>>>>>> }
>>>>>>>
>>>>>>> ...by_inspection_date_and_homeowner_name?reduce=false&startkey=[2011,3,0]&endkey=[2011,3,{}]
>>>>>>>
>>>>>>> Will result in:
>>>>>>> [2011,3,1,"Alice"]
>>>>>>> [2011,3,1,"Bob"]
>>>>>>> [2011,3,2,"Keith"]
>>>>>>>
>>>>>>>
>>>>>>> Does any of that not do what you want?
>>>>>>>
>>>>>>> On Thu, Mar 17, 2011 at 12:33 PM, Justin Walgran <jwalgran@azavea.com>
wrote:
>>>>>>>> Assume a CouchDB storing and indexing housing inspection
records. Each
>>>>>>>> inspection document as two important fields.
>>>>>>>>
>>>>>>>>  - Home owner name
>>>>>>>>  - Inspection date
>>>>>>>>
>>>>>>>> There are about 15,000 inspection documents generated per
month.
>>>>>>>>
>>>>>>>> I need to quickly retrieve a list of inspections for January,
sorted
>>>>>>>> by home owner name.
>>>>>>>>
>>>>>>>> The issue I am running into is the fact that the size of
the result
>>>>>>>> set requires paging the data using limit and startkey. This
would
>>>>>>>> required that the view key be the inspection date, which
means the
>>>>>>>> results cannot be sorted by home owner name. The size of
the data
>>>>>>>> means that pulling it all down to the client and sorting
in the
>>>>>>>> browser is not performant.
>>>>>>>>
>>>>>>>> Is there a clever way to solve this problem?
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> Justin
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Keith Gable
>>>>>>> A+ Certified Professional
>>>>>>> Network+ Certified Professional
>>>>>>> Web Developer
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Keith Gable
>>>>> A+ Certified Professional
>>>>> Network+ Certified Professional
>>>>> Web Developer
>>>>>
>>>>
>>>
>>
>

Mime
View raw message