incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Zachary Zolton <zachary.zol...@gmail.com>
Subject Re: Paging large result sets with sorting
Date Fri, 18 Mar 2011 14:14:10 GMT
I've made good use of CouchDB-Lucene in the past, but haven't had a
chance to play around with ElasticSearch.

Another alternative would be to schedule a background process to
create a summary document for each month's data.

On Fri, Mar 18, 2011 at 8:41 AM, Justin Walgran <jwalgran@azavea.com> wrote:
> Thanks for the suggestion, Zach. The problem I'm running into is that
> there are too many results to sort quickly in a  list function or on
> the client.
>
> It is looking more and more like hooking up some flavor of Lucene may
> be the only way to solve this problem.
>
> Does anyone have recommendations on using ElasticSearch vs. CouchDB-Lucene?
>
> Justin
>
> On Thu, Mar 17, 2011 at 5:23 PM, Zachary Zolton
> <zachary.zolton@gmail.com> wrote:
>> Justin,
>>
>> Depending on your intended usage, it may be acceptable to just use the
>> view to filter by the desired month and then perform your sort in
>> client-side code. Alternatively, you could do the sorting server-side
>> in a _list function, but this may put quite a burden on your CouchDB
>> server if you're making a high volume of these queries.
>>
>> Also, CouchDB-Lucene is very capable of querying ranges in one field
>> while sorting on an additional field.
>>
>>
>> Cheers,
>>
>> Zach
>>
>> On Thu, Mar 17, 2011 at 3:34 PM, Justin Walgran <jwalgran@azavea.com> wrote:
>>> I'm sorry, I oversimplified my problem statement. Your solution is
>>> correct if I only need to select by month. Unfortunately I also need
>>> to support an arbitrary inspection date range for filtering results.
>>> February 6th to march 14th for example. This is where the trouble
>>> creeps in.
>>>
>>> Justin
>>>
>>> On Thu, Mar 17, 2011 at 4:29 PM, Keith Gable <ziggy@ignition-project.com>
wrote:
>>>> Then simply emit the name before the day of the month. Then, it'll
>>>> sort by name then day of month.
>>>>
>>>> On Thu, Mar 17, 2011 at 3:17 PM, Justin Walgran <jwalgran@azavea.com>
wrote:
>>>>> Thanks for the thoughtful reply, Keith.
>>>>>
>>>>> Assume these input docs:
>>>>>
>>>>>  { "inspection_date": "2011-03-01", "homeowner_name": "Bob" }
>>>>>
>>>>>  { "inspection_date": "2011-03-02", "homeowner_name": "Keith" }
>>>>>
>>>>>  { "inspection_date": "2011-03-03", "homeowner_name": "Alice" }
>>>>>
>>>>> The key output from
>>>>> by_inspection_date_and_homeowner_name?reduce=false&startkey=[2011,3,0]&endkey=[2011,3,{}]
>>>>> would be:
>>>>>
>>>>>  [2011,3,1,"Bob"]
>>>>>  [2011,3,2,"Keith"]
>>>>>  [2011,3,3,"Alice"]
>>>>>
>>>>> Which is not sorted by home owner name. That's the gotcha.
>>>>>
>>>>>
>>>>> Justin
>>>>>
>>>>> On Thu, Mar 17, 2011 at 2:13 PM, Keith Gable <ziggy@ignition-project.com>
wrote:
>>>>>> Uh. This sounds simple?
>>>>>>
>>>>>> view: by_home_owner_name:
>>>>>> if (doc.home_owner_name) { emit(doc.home_owner_name, 1); }
>>>>>>
>>>>>> view: by_inspection_date:
>>>>>> if (doc.inspection_date) {
>>>>>> var d = new Date(doc.inspection_date);
>>>>>> emit ([ d.getFullYear(), d.getMonth() + 1, d.getDate() ], 1);
>>>>>> }
>>>>>>
>>>>>> To look for all of my inspections:
>>>>>> ...by_home_owner_name?key=Keith Gable
>>>>>>
>>>>>> To get all of the inspections for today:
>>>>>> ...by_inspection_date?reduce=false&key=[2011,3,17]
>>>>>>
>>>>>> To get all of the inspections for this month:
>>>>>> ...by_inspection_date?reduce=false&startkey=[2011,3,0]&endkey=[2011,3,{}]
>>>>>>
>>>>>>
>>>>>> Combining the two:
>>>>>>
>>>>>> view: by_inspection_date_and_homeowner_name:
>>>>>> if (doc.inspection_date && doc.homeowner_name) {
>>>>>> var d = new Date(doc.inspection_date);
>>>>>> emit ([ d.getFullYear(), d.getMonth() + 1, d.getDate(),
>>>>>> doc.homeowner_name ], 1);
>>>>>> }
>>>>>>
>>>>>> ...by_inspection_date_and_homeowner_name?reduce=false&startkey=[2011,3,0]&endkey=[2011,3,{}]
>>>>>>
>>>>>> Will result in:
>>>>>> [2011,3,1,"Alice"]
>>>>>> [2011,3,1,"Bob"]
>>>>>> [2011,3,2,"Keith"]
>>>>>>
>>>>>>
>>>>>> Does any of that not do what you want?
>>>>>>
>>>>>> On Thu, Mar 17, 2011 at 12:33 PM, Justin Walgran <jwalgran@azavea.com>
wrote:
>>>>>>> Assume a CouchDB storing and indexing housing inspection records.
Each
>>>>>>> inspection document as two important fields.
>>>>>>>
>>>>>>>  - Home owner name
>>>>>>>  - Inspection date
>>>>>>>
>>>>>>> There are about 15,000 inspection documents generated per month.
>>>>>>>
>>>>>>> I need to quickly retrieve a list of inspections for January,
sorted
>>>>>>> by home owner name.
>>>>>>>
>>>>>>> The issue I am running into is the fact that the size of the
result
>>>>>>> set requires paging the data using limit and startkey. This would
>>>>>>> required that the view key be the inspection date, which means
the
>>>>>>> results cannot be sorted by home owner name. The size of the
data
>>>>>>> means that pulling it all down to the client and sorting in the
>>>>>>> browser is not performant.
>>>>>>>
>>>>>>> Is there a clever way to solve this problem?
>>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Justin
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Keith Gable
>>>>>> A+ Certified Professional
>>>>>> Network+ Certified Professional
>>>>>> Web Developer
>>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Keith Gable
>>>> A+ Certified Professional
>>>> Network+ Certified Professional
>>>> Web Developer
>>>>
>>>
>>
>

Mime
View raw message