Return-Path: Delivered-To: apmail-jakarta-lucene-dev-archive@www.apache.org Received: (qmail 4378 invoked from network); 16 Sep 2004 19:04:58 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur-2.apache.org with SMTP; 16 Sep 2004 19:04:58 -0000 Received: (qmail 11032 invoked by uid 500); 16 Sep 2004 19:04:55 -0000 Delivered-To: apmail-jakarta-lucene-dev-archive@jakarta.apache.org Received: (qmail 11010 invoked by uid 500); 16 Sep 2004 19:04:55 -0000 Mailing-List: contact lucene-dev-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Developers List" Reply-To: "Lucene Developers List" Delivered-To: mailing list lucene-dev@jakarta.apache.org Received: (qmail 10994 invoked by uid 99); 16 Sep 2004 19:04:54 -0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_HELO_PASS X-Spam-Check-By: apache.org Received-SPF: neutral (hermes.apache.org: local policy) Received: from [207.217.120.228] (HELO mynah.mail.pas.earthlink.net) (207.217.120.228) by apache.org (qpsmtpd/0.28) with ESMTP; Thu, 16 Sep 2004 12:04:53 -0700 Received: from user-1121ked.dsl.mindspring.com ([66.32.209.205] helo=ENGELSSERVER) by mynah.mail.pas.earthlink.net with asmtp (Exim 4.34) id 1C81Yq-0004Nn-5f for lucene-dev@jakarta.apache.org; Thu, 16 Sep 2004 12:04:52 -0700 Reply-To: From: "Robert Engels" To: "Lucene Developers List" Subject: RE: mg4j - Managing Gigabyte for Java Date: Thu, 16 Sep 2004 14:04:51 -0500 Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook IMO, Build 9.0.6604 (9.0.2911.0) In-Reply-To: <4149E1B3.1010903@apache.org> X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.2180 Importance: Normal X-ELNK-Trace: 33cbdd8ed9881ca8776432462e451d7b2728ff8d3d716ca3024a884fa2644147a3319e6db6a4dc32350badd9bab72f9c350badd9bab72f9c350badd9bab72f9c X-Originating-IP: 66.32.209.205 X-Virus-Checked: Checked X-Spam-Rating: minotaur-2.apache.org 1.6.2 0/1000/N I think the best way to move in this direction is to make IndexReader and IndexWriter pure interfaces. It will go along way towards these sort of changes, since the api at the interface level will need configuration (capability queries) methods in order to support using any 'lucene tools' with any 'lucene index'. I know it has been discussed before, but is this (interfaces for IndexReaderWriter) going to make it on the list for 1.9/2.0 ? -----Original Message----- From: Doug Cutting [mailto:cutting@apache.org] Sent: Thursday, September 16, 2004 1:56 PM To: Lucene Developers List Subject: Re: mg4j - Managing Gigabyte for Java Antonio Gulli wrote: > Just a question: my personal experience with a commercial engine i > partly developed is the the "continuation bit" (aka altavista solution) > is a good and efficient solution w.r.t gamma code, delta code and other > codes used for variable lenght int rappresentation (see MG). > > Given an int say n, continuation bit is just to consider a byte as 7 bit > + 1 bit used to say if the next byte is also used to rappresent n. This is what Lucene uses for the reasons you mention: it is a good compromise between compression and performance. Long-term I'd like to make Lucene's posting format extensible. In addition to altering the compression method, the granularity of the index should be flexible. Currently postings for all indexed fields consist of > tuples. Instead, folks should be able to have postings like: . for pure boolean matching only . for vector matching, no phrases . * > for boosting term occurrences by, e.g., position in document, bolding, headings, etc. Extending Lucene to efficiently and flexibly support this will be a design challenge, but I think it will benefit lots of applications. Doug --------------------------------------------------------------------- To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org For additional commands, e-mail: lucene-dev-help@jakarta.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org For additional commands, e-mail: lucene-dev-help@jakarta.apache.org