Return-Path: Delivered-To: apmail-lucene-java-dev-archive@www.apache.org Received: (qmail 97704 invoked from network); 5 Aug 2006 07:57:10 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 5 Aug 2006 07:57:10 -0000 Received: (qmail 4724 invoked by uid 500); 5 Aug 2006 07:57:07 -0000 Delivered-To: apmail-lucene-java-dev-archive@lucene.apache.org Received: (qmail 4686 invoked by uid 500); 5 Aug 2006 07:57:07 -0000 Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-dev@lucene.apache.org Delivered-To: mailing list java-dev@lucene.apache.org Received: (qmail 4675 invoked by uid 99); 5 Aug 2006 07:57:07 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 05 Aug 2006 00:57:07 -0700 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received-SPF: neutral (asf.osuosl.org: local policy) Received: from [80.236.3.116] (HELO brando.numericable.net) (80.236.3.116) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 05 Aug 2006 00:57:05 -0700 Received: (qmail 4636 invoked from network); 5 Aug 2006 07:56:43 -0000 Received: from unknown (HELO [192.168.1.101]) ([81.220.88.208]) (envelope-sender ) by brando.numericable.net (qmail-ldap-1.03) with SMTP for ; 5 Aug 2006 07:56:43 -0000 From: Nicolas =?utf-8?q?Lalev=C3=A9e?= To: java-dev@lucene.apache.org Subject: Re: Flexible index format / Payloads Cont'd Date: Sat, 5 Aug 2006 09:54:17 +0200 User-Agent: KMail/1.9.1 Organization: Anyware Technologies MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Content-Disposition: inline Message-Id: <200608050954.17647.nicolas.lalevee@anyware-tech.com> X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N Le Jeudi 3 Ao=C3=BBt 2006 21:49, Marvin Humphrey a =C3=A9crit : > On Jul 31, 2006, at 8:25 AM, Nicolas Lalev=C3=A9e wrote: > > That looks good, but there is one restriction : it have to be per > > document. > > Yes, what I laid out was per-document - for each document, the fdx > file would keep a file pointer and an integer mapping to a codec. > > > In fact I was thinking about a more generic version that will allow > > the format > > compatibility, keeping .fdx as is : > > > > FieldData (.fdt) --> SegSize > > DocFieldData --> FieldCount, FieldCount > > > > And a default FieldsDataWriter will be the actual one, it will read > > the > > RawData as Bits, Value, with Value --> String | BinaryValue,.... > > Then, for my app, I will provide some custom FieldsDataWriter that > > will do > > exactly what I want. > > OK, that's quite similar, but with the info specifying how to > deserialize the document stored in fdt rather than fdx. In fact, you're not obliged to put a "codec" thing. If in your app your dat= a=20 will always have the same form, then you just put the data and no codec inf= o.=20 =46or my use case, I would skipped the bits about compressed/binary, and I = will=20 only put what I want : a pointer to a type, a pointer to a lang, and the=20 value. One important note about this design is that the index would only be read b= y=20 my custom reader and write by my custom writter. > However, I=20 > don't think what you're describing makes the field storage in Lucene > arbitrarily extensible, since you're just going to override > FieldsWriter/FieldsReader rather than modify them so that they can > use arbitrary codecs. If you override FieldsWriter/FieldsReader, then you can put the=20 writing/reading code you want, so you implement an arbitrary codec. > I think what I want to do is turn Lucene into an Object-Oriented > Database, or at least have Lucene adopt some characteristics of an > ODBMS. However, I haven't used a real ODBMS and I'm not up on the > theory, so I can't say for sure. I've been doing a little reading > here and there on object databases, but I've been extraordinarily > busy the last few weeks and haven't been able to study it in depth. > > The main point is this: > > Lucene users have diverse needs for what gets stored in the document/ > field storage. We've been meeting those needs by assigning more and > more bit flags. That can't continue that ad infinitum. However, we > *can* meet everyone's needs by applying a variant of the "Replace > Conditionals With Polymorphism" refactoring technique... > > http://xrl.us/p3kn (Link to www.eli.sdsu.edu) > > Think of those bit flags as an if-else chain. Instead of all those > conditionals describing all the attributes of the Lucene Document you > want to store at that file pointer, we allow you to put whatever kind > of serialized object you desire there. Maybe it's a Lucene > Document. Maybe it's a FrechDocument. Maybe it's a > RussianDocument. Maybe it's a wrapped-up jpg. You choose. > > Instead of continually adding to the complexity of the > deserialization algorithm, we we make that deserialization algorithm > user-definable. In fact, this is exactly my point. :-) If people thinks it is interesting, I can try to do a prototype. cheers, Nicolas --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org For additional commands, e-mail: java-dev-help@lucene.apache.org