Return-Path: Delivered-To: apmail-lucene-java-dev-archive@www.apache.org Received: (qmail 5106 invoked from network); 26 Oct 2006 07:06:39 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 26 Oct 2006 07:06:38 -0000 Received: (qmail 67407 invoked by uid 500); 25 Oct 2006 12:48:02 -0000 Delivered-To: apmail-lucene-java-dev-archive@lucene.apache.org Received: (qmail 67369 invoked by uid 500); 25 Oct 2006 12:48:02 -0000 Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-dev@lucene.apache.org Delivered-To: mailing list java-dev@lucene.apache.org Received: (qmail 67354 invoked by uid 99); 25 Oct 2006 12:48:02 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 25 Oct 2006 05:48:02 -0700 X-ASF-Spam-Status: No, hits=0.5 required=10.0 tests=DNS_FROM_RFC_ABUSE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (herse.apache.org: domain of fawcett@gmail.com designates 66.249.82.238 as permitted sender) Received: from [66.249.82.238] (HELO wx-out-0506.google.com) (66.249.82.238) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 25 Oct 2006 05:47:48 -0700 Received: by wx-out-0506.google.com with SMTP id s15so101117wxc for ; Wed, 25 Oct 2006 05:47:27 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=qZkoCzUU4Adww6sqqVUXaskDveWBFTCSb0u7qlYgkLUDSAA48SDjoHjPyeligVeTWF+3SN92+l8P/cJosFqyOk1VOfykiLrwMSEZYUhz55V9HbeDX8qkZNlsdnbe8x39oaPH5vO8LWqXRzZ9+ivLCE9BvoQezz6ekytHdpk+i8E= Received: by 10.70.33.10 with SMTP id g10mr1019672wxg; Wed, 25 Oct 2006 05:47:24 -0700 (PDT) Received: by 10.70.108.13 with HTTP; Wed, 25 Oct 2006 05:47:24 -0700 (PDT) Message-ID: <80bf0fef0610250547v39959afbyddc3baa7f42d854@mail.gmail.com> Date: Wed, 25 Oct 2006 08:47:24 -0400 From: "John Fawcett" To: java-dev@lucene.apache.org Subject: Fwd: Client-Server Lucene - DocumentWriter In-Reply-To: <653FA7B2D9C77C45812EB4717D558C5201F91FDC@MI8NYCMAIL14.Mi8.com> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <653FA7B2D9C77C45812EB4717D558C5201F91FDC@MI8NYCMAIL14.Mi8.com> X-Virus-Checked: Checked by ClamAV on apache.org ---------- Forwarded message ---------- From: John Fawcett Date: Wed, 25 Oct 2006 08:39:23 -0400 Subject: Client-Server Lucene - DocumentWriter To: fawcett@gmail.com Hi, I have a design challenge in my own application's use of Lucene, which triggered an idea for distributed Lucene indexing. Below, I've summarized the design challenge, and then the indexing idea. My team is working on a client/server application. The server is a java application, and the client is in C#/.net. Right now we are adding capability for offline operation of the client. Search is part of this work, so we have been working with Lucene.net to port some of our online search capabilities to offline. The client only holds a subset of the data held on the server, so we'd like to move a subset of the search index to the client. There are two types of transfers - bulk and incremental. Our goal in both is to offload as much work as possible from the client to the server. Bulk transfers happen when a client is initializing for offline use, or resynching after returning to online. In these scenarios we plan to create a new index on the server, and just send the files to the client. The client will then have to perform an index merge. Incremental adds happen when the client application is online. New documents are transferred to the client asynchronously. Currently, we are transferring a document's extracted text. However, the client still has to perform analysis, inversion, and addition to the index. Looking through the code for the IndexWriter, I found the DocumentWriter class. DocumentWriter does the inversion and stores it in a set of integer arrays and an array of "Posting" objects. Looking through the class, it seems like the inversion info could be serialized from server to client pretty easily. The serialized data from DocumentWriter would be a portable "index record" for a single document. Our hope is that we can send this index record from the server to the client. The idea is to reduce the work on the client to be only the insertion of the inverted document to the local index. Having a portable index "record" for an individual document seems very useful - especially for distributed indexing. I can imagine running a farm of indexers that only invert documents and send them to a set of search machines that maintain indexes and field search queries. Is this something that could be added to the Lucene framework? Is the "search record" data calculated in DocumentWriter in any way dependent on the contents of the index? Will this actually save us many client cycles? Thanks, fawce --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org For additional commands, e-mail: java-dev-help@lucene.apache.org