Return-Path: Delivered-To: apmail-couchdb-user-archive@www.apache.org Received: (qmail 87063 invoked from network); 30 Jul 2010 21:45:43 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 30 Jul 2010 21:45:43 -0000 Received: (qmail 65762 invoked by uid 500); 30 Jul 2010 21:45:42 -0000 Delivered-To: apmail-couchdb-user-archive@couchdb.apache.org Received: (qmail 65685 invoked by uid 500); 30 Jul 2010 21:45:41 -0000 Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@couchdb.apache.org Delivered-To: mailing list user@couchdb.apache.org Received: (qmail 65677 invoked by uid 99); 30 Jul 2010 21:45:41 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 30 Jul 2010 21:45:41 +0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of kevin.r.coombes@gmail.com designates 74.125.83.52 as permitted sender) Received: from [74.125.83.52] (HELO mail-gw0-f52.google.com) (74.125.83.52) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 30 Jul 2010 21:45:32 +0000 Received: by gwj20 with SMTP id 20so1143064gwj.11 for ; Fri, 30 Jul 2010 14:45:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from :organization:user-agent:mime-version:to:subject:content-type :content-transfer-encoding; bh=yW3rrnalgAyNH+lFYIDvzep1q3EpJz9WMpF9D3jP6Ao=; b=NnnzyA9ggw+kWgi/XudkSsNDBK1OcZkg7+uM/FHAq7VwY8AW2YmwHdK7jRfXajm/PC BoVYEhMVA+E8skE76dJtvtKmmQQAfoomUbxhfguviU+Quqgu48Efc8F9iQFClfp2QdRe I8NIHmrfaps9w/pMVkynkNwNmuppwTEMOq3zQ= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:organization:user-agent:mime-version:to :subject:content-type:content-transfer-encoding; b=rm1iMd5b+lmTStLX8bvOMHneizBWepQITdmdmo+pdl/cE/TtHSTnAgHmqzd69AfpHH igDSswQaFB0AELrLL243Chb/p56mN1xbG3CvL2ZtNSNT6RTS5RxSQuLHvPI41+3k6Uda j8eRnUSwriCaNggqsu2l6RYWe6T33Z2BvE1xg= Received: by 10.151.42.18 with SMTP id u18mr3772992ybj.444.1280526311382; Fri, 30 Jul 2010 14:45:11 -0700 (PDT) Received: from [10.105.34.165] ([143.111.22.28]) by mx.google.com with ESMTPS id w3sm2588472ybl.9.2010.07.30.14.45.09 (version=TLSv1/SSLv3 cipher=RC4-MD5); Fri, 30 Jul 2010 14:45:09 -0700 (PDT) Message-ID: <4C5347E4.8030106@gmail.com> Date: Fri, 30 Jul 2010 16:45:08 -0500 From: Kevin Coombes Organization: UT M.D. Anderson Cancer Center User-Agent: Thunderbird 2.0.0.24 (Windows/20100228) MIME-Version: 1.0 To: user@couchdb.apache.org Subject: overlap query Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org Hi, I'm trying to figure out the best way to implement a query for "overlapping segments". The specific use case involves (biological) genomic data, which is naturally represented by a triple of the form [Chromosome, Start, End]. As a concrete example, the index [1, 123456, 135789] represents the segment on chromosome 1 that extends from base position 123456 through (and including) base position 135789. The segments/documents in CouchDB came from analyzing a set of cell line DNA data to determine segments where the copy number changes. A typical query against this database (from a biologist's point of view) would be to ask what happens to these cell lines in the region of a specific gene. I can easily convert gene names to their positions in the human genome, so this translates to a query asking for all segments that overlap with the region that defines the gene. For example, I might want to find all segments that overlap [1, 130000, 140000]. The example above should be returned as part of te results of this query. The pseudocode for the query I have in mind is something like if (doc.Chromosome == query.Chromosome) { if (doc.Start <= query.End & doc.End >= query.Start) { // show me this document } } The actual view at present is much simpler, basically consisting of if (doc.Start) { emit([doc.Chromosome, doc.Start, doc.End], other-relevant-stuff) } with the idea being that the query parameters should be able to find the desired segments. The problem I have is that I cannot see a reasonable way to use the startkey and endkey parameters to identify these kinds of overlaps. Am I missing something, or is there a way within the CouchDB API to do what I want? (One might note that the database arising from 175 cell lines contains about 300,000 documents, and that you expect the results of most queries to contain onyl about 175 rows (one per cell line). This may constrain the kinds of tricks one can expect to do with additional views or with emitting more stuff.) Thanks, Kevin