couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nils Breunese <N.Breun...@vpro.nl>
Subject Re: Using CouchDB to represent the tokenized text of a book
Date Wed, 03 Nov 2010 16:36:53 GMT
Weston Ruter wrote:

> Specifically, I'm looking at books that are in a constant flux, i.e. books that are being
edited. The application here is for Bible translations in particular, where each word token
needs to be keyed into other metadata, like link to source word, insertion datetime, translator,
etc. Now that I think of it, in order to be referencable, each token would have to exist as
a separate document anyway since parts of documents aren't indexed by ID, I wouldn't think.

That's right. You'll definitely want to use a document per token here.

> I never thought about using a linked list before for this application, good idea. It
would certainly speed up the update process, but it would make retrieving all tokens for a
structure between a start token and end very slow as there would need to be a separate query
for each of the tokens in the structure to look up each next token to retrieve.

Yep, that's the trade-off of linked lists. O(1) for inserts, but O(n) for lookups. Arrays
are the other way around.

> As I mentioned above, metadata and related data are both going to be externally attached
to each token at various sources, so each token needs to  referenced by ID. This fact alone
invalidates a single-document approach because parts of a document can't be linked to, correct?

Correct. Well, you could maybe contruct a document with sections which have ID's of their
own, but that doesn't sound very relaxing.

Nils Breunese.
------------------------------------------------------------------------
 VPRO
 phone:  +31(0)356712911
 e-mail: info@vpro.nl
 web:    www.vpro.nl
------------------------------------------------------------------------

Mime
View raw message