Return-Path: X-Original-To: apmail-incubator-lucy-dev-archive@www.apache.org Delivered-To: apmail-incubator-lucy-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id F41F972FA for ; Mon, 21 Nov 2011 22:29:33 +0000 (UTC) Received: (qmail 20160 invoked by uid 500); 21 Nov 2011 22:29:33 -0000 Delivered-To: apmail-incubator-lucy-dev-archive@incubator.apache.org Received: (qmail 20069 invoked by uid 500); 21 Nov 2011 22:29:33 -0000 Mailing-List: contact lucy-dev-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: lucy-dev@incubator.apache.org Delivered-To: mailing list lucy-dev@incubator.apache.org Received: (qmail 20061 invoked by uid 99); 21 Nov 2011 22:29:33 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 21 Nov 2011 22:29:33 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [209.85.214.47] (HELO mail-bw0-f47.google.com) (209.85.214.47) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 21 Nov 2011 22:29:25 +0000 Received: by bkbzs2 with SMTP id zs2so6878727bkb.6 for ; Mon, 21 Nov 2011 14:29:04 -0800 (PST) Received: by 10.205.132.148 with SMTP id hu20mr15828423bkc.117.1321914544423; Mon, 21 Nov 2011 14:29:04 -0800 (PST) MIME-Version: 1.0 Received: by 10.223.100.13 with HTTP; Mon, 21 Nov 2011 14:28:33 -0800 (PST) From: Nathan Kurz Date: Mon, 21 Nov 2011 14:28:33 -0800 Message-ID: To: lucy-dev@incubator.apache.org Content-Type: text/plain; charset=UTF-8 X-Virus-Checked: Checked by ClamAV on apache.org Subject: [lucy-dev] Scorers and Formats and Indexes, Oh-My! A problem that I keep coming back to is how to allow custom Scorers to work efficiently with custom Index formats. For efficiency, you want to provide direct access to the underlying data rather than requiring multiple function calls per match, but you don't want to have to subclass each Scorer for each Index. Ideally, ou want every custom Scorer to work with every new Index out of the box. One solution is to come up with a common data format that each Scorer uses, and have the Index capable of producing making that available to the Scorer. I thought this article did a good job of explaining this approach: http://fgiesen.wordpress.com/2011/11/21/buffer-centric-io/ It's essentially what I was envisioning, but also includes some "tricks" that allow for easier error handling. It's not directly applicable to Lucy, but is in C and I thought it might be a good starting point for defining terms and thinking about approaches. --nate