couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Newson <rnew...@apache.org>
Subject Re: Sorting CouchDB data using couch-lucene
Date Tue, 27 Mar 2012 17:14:51 GMT
cross-posting is generally bad form as it splits the conversation. If
I answer here or on SO, someone potentially loses out.

That said, couchdb-lucene is an independent index, sourced from the
database. As such, it has no access to the reduced values nor does it
have an equivalent feature. Where did you see that suggestion?

B.

On 27 March 2012 18:04, nicholas a. evans <nicholas.evans@gmail.com> wrote:
> I have some summation data that was very easy to generate using some
> relatively simple map/reduce views.  But we want to sort the data
> based on the group-reduced view *values* (not the keys).  It was
> suggested that we could use couchdb-lucene to do this.  But how?  It's
> not clear to me how to use a full text index to quickly rank this sort
> of data.
>
> **What we already have**
>
> An oversimplified example view looks something like the following:
>
>    by_sender: {
>      map: "function(doc) { emit(doc.sender, 1); }",
>      reduce: "function(keys, values, rereduce) { return sum(values); }"
>    }
>
> Which returns results somewhat like the following (when run with `group=true`):
>
>     {"rows":[
>     {"key":"a@example.com","value":2},
>     {"key":"aaa@example.com","value":1},
>     {"key":"aaap@example.com","value":34},
>     {"key":"aabb@example.com","value":1},
>     ... thousands or tens of thousands of rows ...
>     ]}
>
> **What we want**
>
> Those are sorted by the key, but I need to sort it data according the
> values, like so:
>
>     {"rows":[
>     {"key":"xyzzy@example.com","value":847},
>     {"key":"adam@example.com","value":345},
>     {"key":"karl@example.com","value":99},
>     {"key":"aaap@example.com","value":34},
>     ... thousands or tens of thousands of rows ...
>     ]}
>
> **More context: what we already tried**
>
> The best answer on
> http://stackoverflow.com/questions/2817703/sorting-couchdb-views-by-value
> gives four viable options, which we've tried in increasing order of
> difficulty:
>
>  1. First we sorted the results client side, but that was *way* too slow.
>  2. Next we created a list view which sorts the data.  A little
> faster, but still too slow.
>  3. Chained Map-Reduce Views should handle this problem easily.
>    - Someone pointed out Cloudant's Chained Map-Reduce Views.  They
> are not in BigCouch but are part of Cloudant's services, which are
> unfortunately not in our budget at this time.
>    - I started an application layer implementation using the
> _bulk_docs API.  It is tricky if you want to keep updates as snappy as
> possible while avoiding race conditions, etc.  I can continue with
> this approach, but it is *not* relaxing.  :(
>  4. The answer suggested using couchdb-lucene.  But I'm not nearly
> familiar enough with full-text search to understand how to get it to
> do anything more sophisticated than index the document and return a
> search result.  I don't even know where to start.
>
> I also posted this at
> http://stackoverflow.com/questions/9893759/sorting-couchdb-data-using-couch-lucene
> Is it bad form to post the question in both places?  I hope not.  :)
>
> Has someone already shared an open source implementation of map-reduce
> chaining?  Are there other good approaches?  Or is this a
> hammer/screwdriver problem: should we be looking outside of couchdb to
> handle this particular type of data analysis?  E.g. monitor the
> changes feed and run "zincrby messages:by_sender 1 $sender" for every
> new row.
>
> Thanks for your consideration!
> --
> Nick Evans

Mime
View raw message