Return-Path: Delivered-To: apmail-incubator-lucy-user-archive@www.apache.org Received: (qmail 93815 invoked from network); 26 Feb 2011 13:36:19 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 26 Feb 2011 13:36:19 -0000 Received: (qmail 67530 invoked by uid 500); 26 Feb 2011 13:36:19 -0000 Delivered-To: apmail-incubator-lucy-user-archive@incubator.apache.org Received: (qmail 67465 invoked by uid 500); 26 Feb 2011 13:36:17 -0000 Mailing-List: contact lucy-user-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: lucy-user@incubator.apache.org Delivered-To: mailing list lucy-user@incubator.apache.org Received: (qmail 67457 invoked by uid 99); 26 Feb 2011 13:36:17 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 26 Feb 2011 13:36:17 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [74.63.6.18] (HELO valleyforge.networkredux.net) (74.63.6.18) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 26 Feb 2011 13:36:09 +0000 Received: from [89.204.211.162] (helo=[10.0.8.2]) by valleyforge.networkredux.net with esmtpsa (TLSv1:AES128-SHA:128) (Exim 4.69) (envelope-from ) id 1PtKJI-0003Zg-8B for lucy-user@incubator.apache.org; Sat, 26 Feb 2011 05:35:48 -0800 Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Apple Message framework v1082) From: "Andrew S. Townley" In-Reply-To: Date: Sat, 26 Feb 2011 13:35:44 +0000 Content-Transfer-Encoding: quoted-printable Message-Id: <8704AC83-A4FD-481E-9A1E-78DD48844B97@atownley.org> References: <49C6D7ED-8432-45BB-AD03-B4B41793AB61@atownley.org> <4D653A83.6030508@peknet.com> <7F2F446B-F69A-4DBF-8ACD-5C4A7ECCCAEF@atownley.org> <20110225161228.GB25588@rectangular.com> To: lucy-user@incubator.apache.org X-Mailer: Apple Mail (2.1082) X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - valleyforge.networkredux.net X-AntiAbuse: Original Domain - incubator.apache.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - atownley.org X-Source: X-Source-Args: X-Source-Dir: X-Virus-Checked: Checked by ClamAV on apache.org Subject: Re: [lucy-user] Feature question about Lucy vs. Ferret On 25 Feb 2011, at 8:03 PM, Nathan Kurz wrote: > On Fri, Feb 25, 2011 at 8:12 AM, Marvin Humphrey = wrote: >> At the end of a search, you will only have documents and scores -- >> not sophisticated metadata about what part of the subquery matched = and what >> parts didn't and how much each matching part contributed to the = score. >> Keeping track of such metadata during the matching phase would be >> prohibitively expensive. >=20 > It's only prohibitive if you don't need that data. If actually need > it (as Andrew seems to), and are going to do it in post-processing > anyway, it's just the cost of doing business. Exactly. Fulltext is only one of several indexing/searching mechanisms = I have, and 99% of the time, the only reason I'm going to use the = fulltext index is to display the results to humans. Thanks to Web = search engines, users have certain expectations of being able to see the = highlighted matches, so that's the standard use case I have. I'm happy to have the option of setting a = :your_performance_will_suck_do_you_really_want_to_do_this flag to = :yes_damnit in order to get the results I want, but I'd prefer to have = the API for dealing with the results as straightforward as possible -- = oh, yeah, and I'll hardly ever be storing the information being queried = in the index itself as I've already got a place for it to live, and it = needs to be available to the other indexing methods too. > My kick has been about making it easy to swap in non-TF/IDF scorers. > I think part of doing so will be adding greater room for scratch data > to Hits returned. My canonical example is that I want to to be > possible to do alphabetical sorting of Hits by a category field. At > some point you need a collector that can see field values, which if > you squint right is just a special case of what Andrew wants. >=20 > While I can see that argument that this is traditionally not the way > that TF/IDF systems work, it's this potential for search/database > hybridization that makes Lucy so attractive to me. Not knowing that much about TF/IDF systems, all I can agree with is the = part about the fulltext/other indexing hybrid approach being an = essential part of information management in the future. People thought things like Datablades/ORDBMS didn't make sense in RDBMS = systems either until vendors proved that you could essentially have your = cake and eat it too from a performance and flexibility perspective. I = see this as just part of the evolution of search technology on the basis = of realizations that the closed-world view of systems is a vestige of = the past. Given the potential for Lucy given its approach, it seems = sensible to try and design for the future here too. Again, thanks for all the discussion and information. Cheers, ast -- Andrew S. Townley http://atownley.org