hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefan Groschupf ...@101tec.com>
Subject Re: HBase implementation question
Date Wed, 02 Jan 2008 11:45:45 GMT
Hi,
> Reads are probably a bit more complicated than writes. A read
> operation first checks the cache and may satisfy the request
> directly from the cache. If not, the operation checks the
> newest MapFile for the data, then the next to newest, ...,
> to the oldest stopping when the requested data has been
> retrieved. Because a random read (or even a sequential read
> that is not a scan) can end up checking multiple files
> for data they are considerably slower than either writes and
> sequential scans (think of a scan as working with a cursor
> in a traditional database).

Sorry, just to double check I understand it correctly. The number of  
files need to be checked for a read is related to the compaction  
threshold, since all files are merged into one big sorted file after a  
given time by the compaction thread?
Any idea how many files usually need to checked in average?
Would it make any sense here to work with key-spaces similar to the  
map/reduce partitioner to keep the number of files that need to be  
read smaller?

Thanks,
Stefan


Mime
View raw message