incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nils Breunese <N.Breun...@vpro.nl>
Subject RE: Looking for advice using CouchDB for a FreeSoftware project
Date Sat, 13 Jun 2009 16:24:39 GMT
Hello,

Some ideas:

- I think I'd make the movie hash a regular field in the document instead of the _id. Then
you can just have multiple subtitle documents for a movie and you could create a view that
emit this field as a key so you can query for it.
- JSON has booleans, so I'd make fansub a boolean instead of a True or False string.
- You could use attachments to store the subtitle files.

Nils Breunese.

________________________________________
Van: fana [fana@2flub.org]
Verzonden: zaterdag 13 juni 2009 16:13
Aan: user@couchdb.apache.org
Onderwerp: Looking for advice using CouchDB for a FreeSoftware project

Hi,

I heard about CouchDB in a german Podcast[1] last week
and I think I found the last missing piece for a FreeSoftware project[2].

  Background:

There is a program called "SubDownloader"[3] which is an XML-RPC client
to the XML-RPC server of http://www.opensubtitles.org . It works like this:

 * You have a movie and you want a subtitle for it.
 * You open your movie with Subdownloader.
 * Subdownloader hashes[4] your movie file.
 * Subdownloader asks XML-RPC server whether it has a subtitle for this
movie hash and downloads it.

Problem now is that opensubtitles.org infrastructure can't handle the load
anymore[5] and it's not possible to scale it.

We now re-implement the XML-RPC server in Python but it was a big headache
designing the database, because we don't want to "navigate the ship in the
same iceberg" as opensubtitles.org did.

I think that CouchDB is perfect for us in terms of scalability,
replication, collaboration and design changes in the future.

As I want to eliminate as much mistakes from the beginning as possible
I would like to ask here for advice and created a first draft how our
database would look like.

Would this draft work out with CouchDB or is there a better way?

SubtitleFile
------------

{
  "_id"              : "String",       (MD5 hash of subtitle file)
  "type"             : "subtitlefile",
  "format"           : "String",       (e.g. "SubRip")
  "language"         : "String",       (ISO 639-2 code)
  "hearing_impaired" : "String",       ("True" or "False")
  "fansub"           : "String",       ("True" or "False")
  "uploader"         : "String",
  "_attachments"     :

  {
    "subtitle.srt":
    {
      "content_type" : "text\/plain",
      "data"         : "VGhpcyBpcyBhIGJhc2U2NCBlbmNvZGVkIHRleHQ="
    }
  }

}



  THERE IS NO HOSTING OF MOVIE FILES OF THE MOVIE INDUSTRY
  (just peoples' file hashes)

MovieFile
---------

{
  "_id"      : "String",               (Computed hash of movie file)
  "type"     : "moviefile",
  "length"   :  number,                (seconds)
  "filesize" :  number,                (kb)
  "fps"      :  number,
  "uploader" : "String"
}

Relation
--------

{
                                       (here "_id" will be generated by
CouchDB)
  "type"            : "relation"
  "id_subtitlefile" : "String",        (the MD5 hash of the subtitle)
  "id_moviefile"    : "String"         (the     hash of the movie file)
}


[1] http://chaosradio.ccc.de/cre125.html
[2] https://launchpad.net/osclone
[3] http://subdownloader.net
[4]
http://trac.opensubtitles.org/projects/opensubtitles/wiki/HashSourceCodes
[5] http://forum.opensubtitles.org/viewtopic.php?t=1775

De informatie vervat in deze  e-mail en meegezonden bijlagen is uitsluitend bedoeld voor gebruik
door de geadresseerde en kan vertrouwelijke informatie bevatten. Openbaarmaking, vermenigvuldiging,
verspreiding en/of verstrekking van deze informatie aan derden is voorbehouden aan geadresseerde.
De VPRO staat niet in voor de juiste en volledige overbrenging van de inhoud van een verzonden
e-mail, noch voor tijdige ontvangst daarvan.

Mime
View raw message