Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 74363 invoked from network); 19 Feb 2007 18:55:11 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 19 Feb 2007 18:55:11 -0000 Received: (qmail 17524 invoked by uid 500); 19 Feb 2007 18:55:12 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 17490 invoked by uid 500); 19 Feb 2007 18:55:12 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 17479 invoked by uid 99); 19 Feb 2007 18:55:12 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 19 Feb 2007 10:55:12 -0800 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received-SPF: pass (herse.apache.org: local policy) Received: from [68.116.38.223] (HELO rectangular.com) (68.116.38.223) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 19 Feb 2007 10:55:00 -0800 Received: from [67.189.26.9] (helo=[10.0.1.3]) by rectangular.com with esmtpa (Exim 4.44) id 1HJEAP-000Ksr-AK for java-user@lucene.apache.org; Mon, 19 Feb 2007 11:27:17 -0800 Mime-Version: 1.0 (Apple Message framework v752.2) In-Reply-To: References: <20070126195010.90762.qmail@web50305.mail.yahoo.com> <20070215150143.GA16525@fermat.math.technion.ac.il> <4D497953-E489-4716-90BD-59A1B9AA6A93@rectangular.com> <3F2DFC4E-581A-44AC-B51A-CAFC9B2973B8@rectangular.com> Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed Message-Id: <5C7142B8-D773-4576-8EA8-3C03D2638CFC@rectangular.com> Content-Transfer-Encoding: 7bit From: Marvin Humphrey Subject: Re: NO_NORMS and TOKENIZED? Date: Mon, 19 Feb 2007 10:54:38 -0800 To: java-user@lucene.apache.org X-Mailer: Apple Mail (2.752.2) X-Virus-Checked: Checked by ClamAV on apache.org On Feb 19, 2007, at 8:45 AM, Yonik Seeley wrote: > If I had to do it over again, I'd be tempted to further restrict the > patterns so that they could be looked up from a Map rather than > linearly. Awesome. I know exactly how I'm going to implement this now. > This hasn't proved to be a problem so far though, as the > number of field-types for dynamic fields normally remains small. For KS, there will be only one abstract class dedicated to multi- dimensional data. Users will subclass to provide their own arbitrary field definitions. The field definition itself won't be dynamic -- only the suffix on the field name will be. For a hashmap lookup, a prefix pattern could be restricted one of two ways: fixed length, or terminal character. I'm inclined to go with a terminating underscore in the field name -- that allows the users to choose their own prefix for maximum readability, at the cost of an additional scan. Here's how the schema for your CNET index might look. # ./CNETSchema.pm package CNETSchema::name; use base 'KinoSearch::Schema::FieldSpec'; package CNETSchema::description; use base 'KinoSearch::Schema::FieldSpec'; sub similarity { return KinoSearch::Contrib::LongFieldSim->new; } package CNETSchema::product_id; use base 'KinoSearch::Schema::FieldSpec'; sub analyzed { 0 } package CNETSchema::attr; use base 'KinoSearch::Schema::DeepFieldSpec'; sub analyzed { 0 } sub stored { 0 } package CNETSchema; use base 'KinoSearch::Schema'; use KinoSearch::Analyzer::PolyAnalyzer; sub analyzer { return KinoSearch::Analysis::PolyAnalyzer->new( language => 'en' ); } __PACKAGE__->load_fields(qw( name description product_id attr )); 1; Then, at index time, you'll be able to do this: $index_writer->add_doc({ name => 'Acme LT-1 Laptop', description => 'blah blah blah...', product_id => 'acme-lt-1', attr_weight => 6.3, attr_heat_dissipation_factor => 20, }); I'll need to make a few backend tweaks, but this API pretty much solves the multi-dimensional data problem. :) Thoughts? Marvin Humphrey Rectangular Research http://www.rectangular.com/ --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org