Return-Path: Delivered-To: apmail-lucene-java-dev-archive@www.apache.org Received: (qmail 66095 invoked from network); 3 Aug 2006 19:50:05 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 3 Aug 2006 19:50:05 -0000 Received: (qmail 62456 invoked by uid 500); 3 Aug 2006 19:50:02 -0000 Delivered-To: apmail-lucene-java-dev-archive@lucene.apache.org Received: (qmail 62417 invoked by uid 500); 3 Aug 2006 19:50:02 -0000 Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-dev@lucene.apache.org Delivered-To: mailing list java-dev@lucene.apache.org Received: (qmail 62406 invoked by uid 99); 3 Aug 2006 19:50:02 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 03 Aug 2006 12:50:02 -0700 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received-SPF: pass (asf.osuosl.org: local policy) Received: from [68.116.38.223] (HELO rectangular.com) (68.116.38.223) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 03 Aug 2006 12:50:01 -0700 Received: from [67.189.26.9] (helo=[10.0.1.3]) by rectangular.com with esmtpa (Exim 4.44) id 1G8jPA-000ByW-Cv for java-dev@lucene.apache.org; Thu, 03 Aug 2006 13:02:52 -0700 Mime-Version: 1.0 (Apple Message framework v752.2) In-Reply-To: <200607311725.26387.nicolas.lalevee@anyware-tech.com> References: <44A444A2.20003@gmail.com> <200607211023.54158.nicolas.lalevee@anyware-tech.com> <4F8AC42E-8371-4BB3-826E-9C7E6E4A749C@rectangular.com> <200607311725.26387.nicolas.lalevee@anyware-tech.com> Content-Type: text/plain; charset=ISO-8859-1; delsp=yes; format=flowed Message-Id: Content-Transfer-Encoding: quoted-printable From: Marvin Humphrey Subject: Re: Flexible index format / Payloads Cont'd Date: Thu, 3 Aug 2006 12:49:39 -0700 To: java-dev@lucene.apache.org X-Mailer: Apple Mail (2.752.2) X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N On Jul 31, 2006, at 8:25 AM, Nicolas Lalev=E9e wrote: > > That looks good, but there is one restriction : it have to be per =20 > document. Yes, what I laid out was per-document - for each document, the fdx =20 file would keep a file pointer and an integer mapping to a codec. > In fact I was thinking about a more generic version that will allow =20= > the format > compatibility, keeping .fdx as is : > > FieldData (.fdt) --> SegSize > DocFieldData --> FieldCount, FieldCount > > And a default FieldsDataWriter will be the actual one, it will read =20= > the > RawData as Bits, Value, with Value --> String | BinaryValue,.... > Then, for my app, I will provide some custom FieldsDataWriter that =20 > will do > exactly what I want. OK, that's quite similar, but with the info specifying how to =20 deserialize the document stored in fdt rather than fdx. However, I =20 don't think what you're describing makes the field storage in Lucene =20 arbitrarily extensible, since you're just going to override =20 FieldsWriter/FieldsReader rather than modify them so that they can =20 use arbitrary codecs. I think what I want to do is turn Lucene into an Object-Oriented =20 Database, or at least have Lucene adopt some characteristics of an =20 ODBMS. However, I haven't used a real ODBMS and I'm not up on the =20 theory, so I can't say for sure. I've been doing a little reading =20 here and there on object databases, but I've been extraordinarily =20 busy the last few weeks and haven't been able to study it in depth. The main point is this: Lucene users have diverse needs for what gets stored in the document/=20 field storage. We've been meeting those needs by assigning more and =20 more bit flags. That can't continue that ad infinitum. However, we =20 *can* meet everyone's needs by applying a variant of the "Replace =20 Conditionals With Polymorphism" refactoring technique... http://xrl.us/p3kn (Link to www.eli.sdsu.edu) Think of those bit flags as an if-else chain. Instead of all those =20 conditionals describing all the attributes of the Lucene Document you =20= want to store at that file pointer, we allow you to put whatever kind =20= of serialized object you desire there. Maybe it's a Lucene =20 Document. Maybe it's a FrechDocument. Maybe it's a =20 RussianDocument. Maybe it's a wrapped-up jpg. You choose. Instead of continually adding to the complexity of the =20 deserialization algorithm, we we make that deserialization algorithm =20 user-definable. Marvin Humphrey Rectangular Research http://www.rectangular.com/ --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org For additional commands, e-mail: java-dev-help@lucene.apache.org