lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tim Jones <>
Subject Re: Lucene-based Distributed Index Leveraging Hadoop
Date Tue, 12 Feb 2008 05:36:16 GMT
I am guessing that the idea behind not putting the indexes in HDFS is  
(1) maximize performance; (2) they are relatively transient - meaning  
the data they are created from could be in HDFS, but the indexes  
themselves are just local.  To avoid having to recreate them, a backup  
copy could be kept in HDFS.

Since a goal is to be able to update them (frequently), this seems  
like a good approach to me.


Andrzej Bialecki wrote:
> Doug Cutting wrote:
>> My primary difference with your proposal is that I would like to  
>> support online indexing.  Documents could be inserted and removed  
>> directly, and shards would synchronize changes amongst replicas,  
>> with an "eventual consistency" model.  Indexes would not be stored  
>> in HDFS, but directly on the local disk of each node.  Hadoop would  
>> perhaps not play a role. In many ways this would resemble CouchDB,  
>> but with explicit support for sharding and failover from the outset.
> It's true that searching over HDFS is slow - but I'd hate to lose  
> all other HDFS benefits and have to start from scratch ... I wonder  
> what would be the performance of FsDirectory over an HDFS index that  
> is "pinned" to a local disk, i.e. a full local replica is available,  
> with block size of each index file equal to the file size.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message