Return-Path: Delivered-To: apmail-lucene-hadoop-commits-archive@locus.apache.org Received: (qmail 5517 invoked from network); 18 Dec 2007 14:28:05 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 18 Dec 2007 14:28:05 -0000 Received: (qmail 98785 invoked by uid 500); 18 Dec 2007 14:27:54 -0000 Delivered-To: apmail-lucene-hadoop-commits-archive@lucene.apache.org Received: (qmail 98670 invoked by uid 500); 18 Dec 2007 14:27:54 -0000 Mailing-List: contact hadoop-commits-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hadoop-dev@lucene.apache.org Delivered-To: mailing list hadoop-commits@lucene.apache.org Received: (qmail 98657 invoked by uid 99); 18 Dec 2007 14:27:54 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 18 Dec 2007 06:27:54 -0800 X-ASF-Spam-Status: No, hits=-100.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.130] (HELO eos.apache.org) (140.211.11.130) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 18 Dec 2007 14:27:40 +0000 Received: from eos.apache.org (localhost [127.0.0.1]) by eos.apache.org (Postfix) with ESMTP id 0ED9CD2D4 for ; Tue, 18 Dec 2007 14:27:44 +0000 (GMT) Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit From: Apache Wiki To: hadoop-commits@lucene.apache.org Date: Tue, 18 Dec 2007 14:27:43 -0000 Message-ID: <20071218142743.25430.49968@eos.apache.org> Subject: [Lucene-hadoop Wiki] Trivial Update of "DistributedLucene" by MarkButler X-Virus-Checked: Checked by ClamAV on apache.org Dear Wiki user, You have subscribed to a wiki page or wiki category on "Lucene-hadoop Wiki" for change notification. The following page has been changed by MarkButler: http://wiki.apache.org/lucene-hadoop/DistributedLucene ------------------------------------------------------------------------------ } }}} + == Implementation Notes == + + Rather than using HDFS, DLucene is heavily inspired by HDFS. This is because the files uses in Lucene indexes are quite different from the files that HDFS was designed for. It uses a similar replication algorithm, and where possible HDFS code although it was necessary to make some local changes to the visibility of some classes and methods. + + Unlike HDFS it currently uses a state less Name node. In the event of a failure, the heartbeat information sent by each worker contains a list of all indexes they own, and also the current status of those indexes. This means it should be possible to swap over masters. However the disadvantage is this will result in more network traffic per heartbeat. + + Both the master and workers have a heart beat architecture. On a worker heartbeat, it sends information to the master about its status. In addition, there is a second thread that performs examines a queue of replication tasks, and performs them one at a time (there may be optimisations here). On a master heartbeat, the master performs failure detection and also computes a replication plan. A segment of this plan is then sent back to the correct worker on the next heartbeat. + + I have an abstract node class that both the worker and the master inherit from to simplify the code. + + == Next Steps == + + Design the client API. +