couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gabe Malicki (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (COUCHDB-2182) Why doesn't couchdb support multiple keys requests to a reduce function using group_level ?
Date Sun, 02 Mar 2014 22:21:20 GMT

     [ https://issues.apache.org/jira/browse/COUCHDB-2182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Gabe Malicki updated COUCHDB-2182:
----------------------------------

    Description: 
Why doesn't couchdb support multiple keys requests to a reduce function using group_level
?  I need this functionality in order to reduce the amount of  network IO and latency in my
system.  Perhaps it would be possible to implement this with a warning about the cost of such
a query.  

Here is my example real-world use case:

I have a database of widgets that are being voted on by web users and I have a reduce function
that computes different types of ranking for each widget based on the widget's numerical properties
in conjunction with user votes (1 to 10).  There are far too many widgets to sort in the client
and couchdb does not allow the sorting of view query results by value.   This means in order
to be able to get the widgets sorted by their ranking I need to first write the reduce function
output (score results for each widget) to a database so that I can later query on that data
sorted by score.   

As new votes enter the system they are bulk_saved then a separate view reduce query is executed
for each effected widget to calculate the new scores then those scores are bulk_saved into
a db so that they can later be queried sorted by their various scores.  Since couchdb doesn't
allow querying multiple keys with my reduce function using group_level I'm having to make
20 HTTP GET requests if 20 different widgets were voted on instead of a single HTTP GET and
all of this extra network io means I have to fire off a bunch of threads to block on network
IO which is expensive and this cost becomes greater the more votes I need to process per minute.

Adding multiple keys query with group_level would reduce the amount of CPU, latency, and bandwidth
that I'm having to burn in order to achieve reduce function sorting by value.  Of course if
someone could figure out how to let me sort reduce output by value then that would save me
even more resources.


  was:
Why doesn't couchdb support multiple keys requests to a reduce function using group_level
?  I need this functionality in order to reduce the amount of  network IO and latency in my
system.  Perhaps it would be possible to implement this with a warning about the cost of such
a query.  

Here is my example real-world use case:

I have a database of widgets that are being voted on by web users and I have a reduce function
that computes different types of ranking for each widget based on the widget's numerical properties
in conjunction with user votes (1 to 10).  There are far too many widgets to sort in the client
and couchdb does not allow the sorting of view query results by value.   This means in order
to be able to query the widgets sorted by their ranking I need to first write the reduce function
output (score results) to a database so that I can later query on that data sorted by score.
  

As new votes enter the system they are bulk_saved then a separate view reduce query is executed
for each effected widget to calculate the new scores then those scores are bulk_saved into
a db so that they can later be queried sorted by their various scores.  Since couchdb doesn't
allow querying multiple keys with my reduce function using group_level I'm having to make
20 HTTP GET requests if 20 different widgets were voted on instead of a single HTTP GET and
all of this extra network io means I have to fire off a bunch of threads to block on network
IO which is expensive and this cost becomes greater the more votes I need to process per minute.

Adding multiple keys query with group_level would reduce the amount of CPU, latency, and bandwidth
that I'm having to burn in order to achieve reduce function sorting by value.  Of course if
someone could figure out how to let me sort reduce output by value then that would save me
even more resources.



> Why doesn't couchdb support multiple keys requests to a reduce function using group_level
?
> -------------------------------------------------------------------------------------------
>
>                 Key: COUCHDB-2182
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-2182
>             Project: CouchDB
>          Issue Type: Question
>      Security Level: public(Regular issues) 
>          Components: Database Core, HTTP Interface
>            Reporter: Gabe Malicki
>
> Why doesn't couchdb support multiple keys requests to a reduce function using group_level
?  I need this functionality in order to reduce the amount of  network IO and latency in my
system.  Perhaps it would be possible to implement this with a warning about the cost of such
a query.  
> Here is my example real-world use case:
> I have a database of widgets that are being voted on by web users and I have a reduce
function that computes different types of ranking for each widget based on the widget's numerical
properties in conjunction with user votes (1 to 10).  There are far too many widgets to sort
in the client and couchdb does not allow the sorting of view query results by value.   This
means in order to be able to get the widgets sorted by their ranking I need to first write
the reduce function output (score results for each widget) to a database so that I can later
query on that data sorted by score.   
> As new votes enter the system they are bulk_saved then a separate view reduce query is
executed for each effected widget to calculate the new scores then those scores are bulk_saved
into a db so that they can later be queried sorted by their various scores.  Since couchdb
doesn't allow querying multiple keys with my reduce function using group_level I'm having
to make 20 HTTP GET requests if 20 different widgets were voted on instead of a single HTTP
GET and all of this extra network io means I have to fire off a bunch of threads to block
on network IO which is expensive and this cost becomes greater the more votes I need to process
per minute.
> Adding multiple keys query with group_level would reduce the amount of CPU, latency,
and bandwidth that I'm having to burn in order to achieve reduce function sorting by value.
 Of course if someone could figure out how to let me sort reduce output by value then that
would save me even more resources.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message