hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Lucene-hadoop Wiki] Trivial Update of "DistributedLucene" by MarkButler
Date Wed, 19 Dec 2007 12:42:55 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Lucene-hadoop Wiki" for change notification.

The following page has been changed by MarkButler:
http://wiki.apache.org/lucene-hadoop/DistributedLucene

------------------------------------------------------------------------------
  
  === Issues to be discussed ===
  
+ ==== 6. Broadcasts versus IPC ====
+ 
+ Currently Hadoop does not support broadcasts, and there are problems getting broadcasts
to work across clusters. Do we need to use broadcasts or can we use the same approach as HDFS
and Hbase?
+ 
+ Current approach: does not use broadcasts. 
+ 
  ==== 1. How do searches work? ====
  
  Searches could be broadcast to one index with each id and return merged results. The client
will load-balance both searches and updates.
  
+ Current approach: Sharding is implemented in client API. Currently the master and the workers
know nothing about shards. Client gets a list of all indexes, then selects replicas at random
to query (load balancing). They return results and the client API aggregates them. 
+ 
  ==== 2. How do deletions work? ====
  
  Deletions could be broadcast to all slaves. That would probably be fast enough. Alternately,
indexes could be partitioned by a hash of each document's unique id, permitting deletions
to be routed to the appropriate slave.
+ 
+ Current approach: On non-sharded indexes, deletions are sent directly to the worker. On
sharded ones, they work like searchers described above. 
  
  ==== 3. How does update work? ====
  
@@ -105, +115 @@

  ==== 5. How do commits work? ====
  
  It seems like the master might want to be involved in commits too, or maybe we just rely
on the slave to master heartbeat to kick of immediately after a commit so that index replication
can be initiated? I like the latter approach. New versions are only published as frequently
as clients poll the master for updated IndexLocations. Clients keep a cache of both readable
and updatable index locations that are periodically refreshed.
- 
- ==== 6. Broadcasts versus IPC ====
- 
- Currently Hadoop does not support broadcasts, and there are problems getting broadcasts
to work across clusters. Do we need to use broadcasts or can we use the same approach as HDFS
and Hbase?
  
  ==== 7. Finding updateable indexes ====
  

Mime
View raw message