incubator-couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Filippo Fadda (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (COUCHDB-1868) Using multiple keys, the _all_docs built-in view acts differently then a user defined view
Date Mon, 12 Aug 2013 17:10:48 GMT

    [ https://issues.apache.org/jira/browse/COUCHDB-1868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13737049#comment-13737049
] 

Filippo Fadda commented on COUCHDB-1868:
----------------------------------------

OK, I got your point.

Now, in a relational database you have joins, in CouchDB you can "simulate" a join, using
multiple views, and aggregate the obtained results. But it's pretty hard to do actually, because
CouchDB returns just the rows have been found. Remember the fields returned by a relational
DB are already joined together, in CouchDB you have to join them instead. Actually is pretty
hard to do.

As you know each row is an array.

Let's suppose your are creating an application to show the Marvel heroes. The application
shows the last 10 Marvel heroes to the user, and must display a star on each one if the hero
has been starred. So the application query a view called allLatest store in the doc 'heroes',
asking the last 10 heroes added (they are order by a UNIX timestamp, an integer value). The
query return something like:

{
  "total_rows":23370,
  "offset":0,
  "rows":[
    {"id":"6f4927a6-b891-4e12-9238-1b965d9bb0f4","key":1361281522,"value":"Spiderman"},
    {"id":"bd0678ee-61c8-4933-a09b-729bc10a4af5","key":1360580897,"value":"Superman"},
    {"id":"a6ce26b5-77e1-4a03-a96b-e63e030691a9","key":1360580766,"value":"Batman"},
    {"id":"86a0e140-cf45-4198-f571-164cdc6bf9a3","key":1360079876,"value":"Hulk"},
    {"id":"6b0dde6a-fba0-4b42-f1b6-9885e3b80e12","key":1359548425,"value":"Tor"},
    {"id":"adf0fc73-add7-41bb-80ef-f3211b310019","key":1359002601,"value":"Captain America"},
    {"id":"5555bf45-025e-454a-94bf-d14f936854ba","key":1356289394,"value":"Wolverine"},
    {"id":"6f5e4c73-4456-425a-c99e-aa8fe2c73637","key":1355942176,"value":"Magneto"},
    {"id":"f92cce76-db3b-4a1e-ecdc-ae0a751e560a","key":1355751477,"value":"Rockman"},
    {"id":"a451bda7-fb9b-4df7-c584-88d5a7ecc1f7","key":1354887987,"value":"Robin"},
  ]
}

Then the application queries another view to obtain the ones have been starred by the current
logged-in user:

{
  "total_rows":3456,
  "offset":0,
  "rows":[
    {"id":"cd0a4cad-2ff4-4fcf-9a9c-d0d2c6bc6a20","key":6f4927a6-b891-4e12-9238-1b965d9bb0f4,"value":null},
    {"id":"3756f01d-cde1-41cb-81a2-33d63d2f1b19","key":bd0678ee-61c8-4933-a09b-729bc10a4af5,"value":null},
    {"id":"5e9d6533-2859-48c9-de50-bbc5d6da279a","key":a6ce26b5-77e1-4a03-a96b-e63e030691a9,"value":null}
  ]
}

Unfortunately CouchDB returns just 3 rows. To know if a post has been starred you must write
something really awful and pretty slow, especially when you have many records. Since there
is no match between your first query and the second, you can't simply access the array using
an index, you have to cycle every single array and checking if the item array 'key' exists.
This really sucks. It's awful, slow and ugly. And this is a real case application, this happens
every day. Let me show what CouchDB should return instead, maybe using an option:

{
  "total_rows":3456,
  "offset":0,
  "rows":[
    {"id":"cd0a4cad-2ff4-4fcf-9a9c-d0d2c6bc6a20","key":"6f4927a6-b891-4e12-9238-1b965d9bb0f4","value":null},
    {"id":"3756f01d-cde1-41cb-81a2-33d63d2f1b19","key":"bd0678ee-61c8-4933-a09b-729bc10a4af5","value":null},
    {"id":"5e9d6533-2859-48c9-de50-bbc5d6da279a","key":"a6ce26b5-77e1-4a03-a96b-e63e030691a9","value":null},
    {"id":null,"key":"86a0e140-cf45-4198-f571-164cdc6bf9a3","value":null},
    {"id":null,"key":"6b0dde6a-fba0-4b42-f1b6-9885e3b80e12","value":null},
    {"id":null,"key":"adf0fc73-add7-41bb-80ef-f3211b310019","value":null},
    {"id":null,"key":"5555bf45-025e-454a-94bf-d14f936854ba","value":null},
    {"id":null,"key":"6f5e4c73-4456-425a-c99e-aa8fe2c73637","value":null},
    {"id":null,"key":"f92cce76-db3b-4a1e-ecdc-ae0a751e560a","value":null},
    {"id":null,"key":"a451bda7-fb9b-4df7-c584-88d5a7ecc1f7","value":null}
  ]
}

As you can see above, when a key is not matched the id is null. But the results are in the
same order and it's easy to know there is a match just checking if the id is NULL. This is
fuckin awesome, because you can write something like this:


    // Posts.
    $opts = new ViewQueryOpts();
    $opts->doNotReduce()->reverseOrderOfResults();
    $opts->setLimit(30);
    $posts = $this->couch->queryView("posts", "allLatest", NULL, $opts)->getBodyAsArray();

    // Extracts the post ids.
    $keys = [];
    foreach ($posts["rows"] as $row)
      $keys[] = $row["id"];

    // Stars.
    $opts->reset();
    $opts->doNotReduce();

    $opts->returnsNotMatchedKeys; // This is the special option 'returns_not_matched=true'.

    $stars = $this->couch->queryView("stars", "perItem", $keys, $opts)->getBodyAsArray();

    // And wow, you can finally make a for cycle to print your posts.
    $postsCount = count($posts["rows"]);
    for ($i = 0; $i < $postCount - 1; $i++) {
      echo $posts["rows"][$i]["value"]; // The hero name.

      if (is_null($stars["rows"][$i]["id"]))
        echo "not starred"; // This means the hero is not starred, there is no 'join'.
      else
        echo "starred"; // A document id exist for the key, so yes there is a 'join', the
hero has been starred.
    }


                
> Using multiple keys, the _all_docs built-in view acts differently then a user defined
view
> ------------------------------------------------------------------------------------------
>
>                 Key: COUCHDB-1868
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-1868
>             Project: CouchDB
>          Issue Type: Bug
>          Components: View Server Support
>            Reporter: Filippo Fadda
>
> When you query a view using multiple keys, the _all_docs built-in view acts differently
then a user defined view:
> 1) in the first case CouchDB returns "not_found" for every not found key;
> 2) querying a user defined view produces, instead, an empty array.
> In the first case you obtain error="not_found" for every key, when you query a user defined
view you simply don't get any rows, just the total rows for the view.
> See: http://pastebin.com/D7NExJrd
> Now, regarding 'keys' the documentation says something like: "Used to retrieve just the
view rows matching that set of keys. Rows are returned in the order of the specified keys."
> In a normal case, CouchDB should return just a row for each matched key, but it will
really help, having an option to return a row for every key, even there if not found, because
it's more easy, cycle through results.
> Let's suppose the application I'm doing gets the last 30 blog posts, displaying for each
one, information that are stored into related documents. The application will query, using
as keys the posts' identifiers, other views to get, for example, if a post has been starred
from the current logged-in user, etc.
> If a view always returns a number of rows equals to the number of keys, the application
can cycle from 0 to 29 and display all the related information for a post.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message