couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paul Davis" <paul.joseph.da...@gmail.com>
Subject Re: 3 Newbie questions
Date Thu, 25 Sep 2008 14:25:20 GMT
Question 1.

Really. It depends. I could see storing them as attachments on a
description document just the same as anything else.

The Base64 requirement is old. There was a semi recent patch to allow
plain binary. Not sure if that was in 0.8.1 or not. Also not sure if
that's just relaxdb being old.

For speed issues, make sure your erlang version is semi recent. There
was a bug in one of the old VM's that slowed couch down. I'd recommend
downloading the latest release candidate.

Question 2.

There aren't any specific 'best practices' yet regarding 1 vs 2
databases. Some people are dealing with projects that have hundreds of
thousands of individual databases. Some with a single database. Really
it depends on your situation and more specifically how you want to
view your data. Basically, in multiple databases, you're losing the
power of view generation that you might end up needing.

What kind of times are you seeing on view generation? 200K docs
shouldn't take *that* long. And once the initial generation is done
access should be in the range of milliseconds. See the above note
about updating erlang on your machine.

Question 3.

There really isn't a good way to do such a query in couchdb atm. But
the good news is that there's work going on right now refactoring
couchdb's http interface such that you'll be able to specify
additional external indexing methods. So writing your own index that
could perform that query would be achievable. I like to dream that at
some point couch will have a large selection of indexing methods
available.

Side note, there are some people working on geo data in couch. ATM
google just shows alot of people storing the data. I did find a blog
entry [1] that mentioned trying to tackle the querying.

[1] http://vmx.cx/cgi-bin/blog/index.cgi/category/CouchDB

HTH,

Paul


On Thu, Sep 25, 2008 at 9:17 AM, Will Schenk <wschenk@gmail.com> wrote:
> Hey all
>
> I went to jchris' talk at columbia a week or so ago, and have been playing
> around with couchdb every since.  I'm not sure that I like all of the high
> level design decisions (i for one like types, and I think I'm running in
> this below) but I wanted to actually use it because there certainly are some
> neat things.  I have a couple of newbie usage questions.
>
> The scenario is that I'm building a spider-type thing and going to be
> processing the "remote resource" into a specific localized "document".  So
> I'm going to be pulling in say 3 pages and from that going to produce both a
> document that describes the resteraunt, has it's menu and reviews, knows
> it's lat and long and going to serve them up on a map.  I'm following the
> architecture that I described in http://benchcoach.com/papers/scraping and
> am basically reimplementing menumaps as a proof of concept with couchdb.
>
> I'm using merb and relaxdb at the moment, but I think I may need to get a
> little lower level.
>
> Question 1:  Where do I store the original documents?
>
> Right now I have a "RemoteUrl" document which contains the last-modified,
> etag, encoding, and the content itself.  (Its very important in the design
> that I keep the original content and all previous versions around.)  For
> some reason, I can't store the content directly -- I need to base64 encode
> it, which seems like a problem with the ruby json library.  But couchbd is
> slow when it has all of these 200K documents sitting around in it.  Is this
> not the right sort of usage?  I've created a map/reduce view for the "latest
> revision" like this:
>
> function(doc) {
>   if( doc.class == "RemoteUrl" && doc.content ) {
>     emit(doc.normalized_url, doc );
>   }
>  }
>
> function(key,values,rereduce) {
>  if(rereduce) {
>    return values;
>  } else {
>    var max = values[0].created_at;
>    var doc = values[0];
>    for( i = 0; i < values.length; i++ ) {
>     if( values[i].created_at > max ) {
>      doc = values[i]; max=values[i].created_at;
>      }
>    }
>  }
>  return doc;
> }
>
> but it takes a long time to run and see.  But this is probably because it
> actually needs to load up all the data, send it over a pipe to the
> javascript process, have it do it, send it back, and repeat the process with
> the reduce step.  Its pretty slow.  Is there a better way to do this?  My
> guess is that if I make this into two queries, and don't emit the
> doc.content (the actually content of the file) that it would be a lot
> faster, but that seems pretty ugly.  I.e. just map [doc._id,doc.created_at]
> and then make another trip to pull back the id.  But you still have the
> problem of view creation taking forever.  Does anyone have any suggestions?
>
> Question 2: I'm using RelaxDB right now, and it only really wants to work
> for one database per environment.  Seeing as how slow couchbd processing
> these documents, I was thinking that I'd want to keep the bulk-data stuff in
> it's own database so that the other views won't need to process over the
> whole data set.  The views really work by document "type", so there's no
> need to pump the huge amount of data from the erlang process to the
> javascript process when all it's doing is seeing that doc.class !=
> "RemoteUrl".  (Which is why I'd want to have types, but no matter, I guess
> we can hack them on the side like this!)  I'm guessing that's why it's
> falling down.  So if I could spit out the "web-cache" database from "parsed"
> database I think it would be a little faster.  I'm wonder what people think
> about this sort of design decision, and how they would suggest implementing
> it.
>
> Question 3: Say I eventually get something which has 2 floating point
> attributes.  (e.g. lat and long.)  How would I get all of the documents
> where those were in a square?  i.e.
> select * from places where places.lat >= left_lat and places.lat <=
> right_lat and places.longitude >= top_longitude and places <=
> bottom_longitude;
>
> I can see how'd you'd do this with one dimension, but I'm not sure how you'd
> do it with the second.  Especially since you need to make all these data
> round trips...
>
> Thanks in advance.  I think there are a lot of very interesting ideas in
> Couchdb. It seems like a lot of this stuff can't be don't nearly as well as
> with a sql database, but I'm hoping that it's just me being ignorant.
>
> -w
> http://sublimeguile.com
>
>
>

Mime
View raw message