hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bryan McCormick <br...@readpath.com>
Subject Data Local Questions
Date Wed, 17 Feb 2010 08:10:52 GMT
Quick question about data local vs rack local tasks when running map  
reduce jobs against hbase. I've just run a job against a table that  
was split into 1,645 tasks. Looking at the job page it's reporting  
that 1,445 of those jobs were rack local compared to 200 that were  
data local.  I'm taking these counters to mean that most of the jobs  
were running on a server that wasn't the same as the relevant region  
server.  Is it possible or are there plans to add some logic into the  
scheduler to prefer jobs to run on the same server as the regionserver  
if possible?

With HBase is there a similar way to tell if a region on a  
regionserver has a copy of the files that it needs to serve the region  
on a local datanode instead of having to cross the network to get it?

I know that when you're writing new data into a table and it splits,  
the default is to have the first datanode copy be local. But after a  
fairly large table has been brought up and down several times with all  
of the regions being reassigned, is there logic when assigning regions  
to put them on a data local server?


View raw message