Return-Path: Delivered-To: apmail-incubator-lucy-user-archive@www.apache.org Received: (qmail 4808 invoked from network); 1 Mar 2011 03:29:03 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 1 Mar 2011 03:29:03 -0000 Received: (qmail 99073 invoked by uid 500); 1 Mar 2011 03:29:03 -0000 Delivered-To: apmail-incubator-lucy-user-archive@incubator.apache.org Received: (qmail 98826 invoked by uid 500); 1 Mar 2011 03:29:01 -0000 Mailing-List: contact lucy-user-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: lucy-user@incubator.apache.org Delivered-To: mailing list lucy-user@incubator.apache.org Received: (qmail 98812 invoked by uid 99); 1 Mar 2011 03:29:00 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 01 Mar 2011 03:29:00 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [209.98.116.241] (HELO pekmac.local) (209.98.116.241) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 01 Mar 2011 03:28:55 +0000 Received: from pekmac.local (localhost [127.0.0.1]) by pekmac.local (Postfix) with ESMTP id B0B28322B27 for ; Mon, 28 Feb 2011 21:28:34 -0600 (CST) Message-ID: <4D6C67E2.6030507@peknet.com> Date: Mon, 28 Feb 2011 21:28:34 -0600 From: Peter Karman Reply-To: peter@peknet.com User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.2.13) Gecko/20101207 Thunderbird/3.1.7 MIME-Version: 1.0 To: lucy-user@incubator.apache.org References: <49C6D7ED-8432-45BB-AD03-B4B41793AB61@atownley.org> <4D653A83.6030508@peknet.com> <7F2F446B-F69A-4DBF-8ACD-5C4A7ECCCAEF@atownley.org> <20110225161228.GB25588@rectangular.com> In-Reply-To: X-Enigmail-Version: 1.1.1 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Subject: Re: [lucy-user] Feature question about Lucy vs. Ferret Andrew S. Townley wrote on 2/26/11 7:17 AM: >> You have to generate that information after the fact, by post-processing >> the Hits that come back. Lucy, Lucene, and Ferret all have the same >> behavior in this regard. >> >> Matching and scoring are highly abstracted for speed. The matching engine >> does not scan raw document content, a la an RDBMS full table scan -- >> instead, it iterates over heavily optimized data structures devoid of >> introspection overhead. At the end of a search, you will only have >> documents and scores -- not sophisticated metadata about what part of the >> subquery matched and what parts didn't and how much each matching part >> contributed to the score. Keeping track of such metadata during the >> matching phase would be prohibitively expensive. > > I can understand the need to abstract a lot of things for speed. I'm no > search expert as I've said before, but I don't understand why at the very > least the field information (e.g. name) can't be encoded in this data > structure in such a way that you can determine this information at match > time. Highlighting and offsets are a different matter, and I never thought > it was doing a full-text scan or a table scan like an RDBMS. If I wanted > that, I'd just use regex searches (which I do in some cases for small > datasets). > > Obviously, I'm missing something here, but to me I don't see why it matters > to keep track of fields at all if you don't have the information about which > field matched an "all fields" or "multiple field" search query to hand when > you get the match information back in terms of term and field. Obviously, > actually finding the offsets is a much more expensive operation, and I'm ok > with having to do that after the search is completed--even if I have to do my > own matching without API support for highlighting. However, this is only > possible if I know what term and what field and don't have to effectively > perform the search again on the document (which is what Ferret seems to > require). > I miss this feature too (native interrogation of HitDoc objects to discover which field(s) generated the hit). Marvin, where would be the appropriate place to extend Lucy in this way? I'm guessing Search::Searcher and Search::MatchDoc? -- Peter Karman . http://peknet.com/ . peter@peknet.com