Return-Path: Delivered-To: apmail-couchdb-user-archive@www.apache.org Received: (qmail 51707 invoked from network); 3 Nov 2010 16:36:54 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 3 Nov 2010 16:36:54 -0000 Received: (qmail 990 invoked by uid 500); 3 Nov 2010 16:37:24 -0000 Delivered-To: apmail-couchdb-user-archive@couchdb.apache.org Received: (qmail 888 invoked by uid 500); 3 Nov 2010 16:37:23 -0000 Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@couchdb.apache.org Delivered-To: mailing list user@couchdb.apache.org Received: (qmail 880 invoked by uid 99); 3 Nov 2010 16:37:23 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 03 Nov 2010 16:37:23 +0000 X-ASF-Spam-Status: No, hits=-2.3 required=10.0 tests=RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [145.58.30.185] (HELO out1b.mail.omroep.nl) (145.58.30.185) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 03 Nov 2010 16:37:16 +0000 Received: from localhost (ou1bclean [10.10.30.159]) by out1b.mail.omroep.nl (Postfix MTA - NPO ICT) with ESMTP id 869953000124 for ; Wed, 3 Nov 2010 17:36:54 +0100 (CET) X-Virus-Scanned: NPO ICT Received: from tweehoog.vpro.nl (tweehoog.vpro.nl [145.58.169.4]) by out1b.mail.omroep.nl (Postfix MTA - NPO ICT) with ESMTP id 6C528300010A for ; Wed, 3 Nov 2010 17:36:54 +0100 (CET) Received: from exmail.vpro.nl ([145.58.171.81] helo=VS-EX-01.intra.vpro.nl) by tweehoog.vpro.nl with esmtp (Exim 3.36 #1) id 1PDgKU-0008Jt-00 for user@couchdb.apache.org; Wed, 03 Nov 2010 17:36:54 +0100 Received: from VS-EX-01.intra.vpro.nl ([145.58.171.81]) by VS-EX-01.intra.vpro.nl ([145.58.171.81]) with mapi; Wed, 3 Nov 2010 17:36:54 +0100 From: Nils Breunese To: "user@couchdb.apache.org" Date: Wed, 3 Nov 2010 17:36:53 +0100 Subject: Re: Using CouchDB to represent the tokenized text of a book Thread-Topic: Using CouchDB to represent the tokenized text of a book Thread-Index: Act7dVJ9pCRA/JzFR/+eua/Tc6EXMA== Message-ID: References: <4E746F43-7973-4408-BB4D-2B3672BA9A73@vpro.nl> In-Reply-To: Accept-Language: nl-NL Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: nl-NL Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Weston Ruter wrote: > Specifically, I'm looking at books that are in a constant flux, i.e. book= s that are being edited. The application here is for Bible translations in = particular, where each word token needs to be keyed into other metadata, li= ke link to source word, insertion datetime, translator, etc. Now that I thi= nk of it, in order to be referencable, each token would have to exist as a = separate document anyway since parts of documents aren't indexed by ID, I w= ouldn't think. That's right. You'll definitely want to use a document per token here. > I never thought about using a linked list before for this application, go= od idea. It would certainly speed up the update process, but it would make = retrieving all tokens for a structure between a start token and end very sl= ow as there would need to be a separate query for each of the tokens in the= structure to look up each next token to retrieve. Yep, that's the trade-off of linked lists. O(1) for inserts, but O(n) for l= ookups. Arrays are the other way around. > As I mentioned above, metadata and related data are both going to be exte= rnally attached to each token at various sources, so each token needs to r= eferenced by ID. This fact alone invalidates a single-document approach bec= ause parts of a document can't be linked to, correct? Correct. Well, you could maybe contruct a document with sections which have= ID's of their own, but that doesn't sound very relaxing. Nils Breunese. ------------------------------------------------------------------------ VPRO phone: +31(0)356712911 e-mail: info@vpro.nl web: www.vpro.nl ------------------------------------------------------------------------