hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Lucene-hadoop Wiki] Trivial Update of "DistributedLucene" by MarkButler
Date Tue, 18 Dec 2007 14:27:43 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Lucene-hadoop Wiki" for change notification.

The following page has been changed by MarkButler:
http://wiki.apache.org/lucene-hadoop/DistributedLucene

------------------------------------------------------------------------------
  }
  }}}
  
+ == Implementation Notes ==
+ 
+ Rather than using HDFS, DLucene is heavily inspired by HDFS. This is because the files uses
in Lucene indexes are quite different from the files that HDFS was designed for. It uses a
similar replication algorithm, and where possible HDFS code although it was necessary to make
some local changes to the visibility of some classes and methods. 
+ 
+ Unlike HDFS it currently uses a state less Name node. In the event of a failure, the heartbeat
information sent by each worker contains a list of all indexes they own, and also the current
status of those indexes. This means it should be possible to swap over masters. However the
disadvantage is this will result in more network traffic per heartbeat.
+ 
+ Both the master and workers have a heart beat architecture. On a worker heartbeat, it sends
information to the master about its status. In addition, there is a second thread that performs
examines a queue of replication tasks, and performs them one at a time (there may be optimisations
here). On a master heartbeat, the master performs failure detection and also computes a replication
plan. A segment of this plan is then sent back to the correct worker on the next heartbeat.

+ 
+ I have an abstract node class that both the worker and the master inherit from to simplify
the code. 
+ 
+ == Next Steps ==
+ 
+ Design the client API. 
+ 

Mime
View raw message