Mailing-List: contact core-user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: core-user@hadoop.apache.org
Received-SPF: pass (nike.apache.org: local policy)
Message-ID: <479A2D91.60101@duboce.net>
Date: Fri, 25 Jan 2008 10:42:25 -0800
From: stack <stack@duboce.net>
User-Agent: Thunderbird 2.0.0.9 (Macintosh/20071031)
MIME-Version: 1.0
To: core-user@hadoop.apache.org
Subject: Re: Region offline issues
References: <1201206231.16299.36.camel@mharris1.jumptap.com>
	 <34506233-C375-4D48-8CFE-897DE406847B@rapleaf.com>
	 <479904A2.3080007@duboce.net>
 <1201278063.16299.72.camel@mharris1.jumptap.com>
In-Reply-To: <1201278063.16299.72.camel@mharris1.jumptap.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

Marc Harris wrote:
> To Byan's points:
>   
...
> 2) There does not appear to be anything else significant in the logs. I
> can send them to you if you like but I think my previous comment may
> cause you to be less interested.
>
>   
Send them to me if you don't mind.  I'd look at them to see what was 
going on in the regionserver such that the client couldn't get a update 
in during a run of all the retries (I'd guess it to do with HADOOP-2712 
and HADOOP-2615).


> 3) About success running on a 13 node cluster. I think that's really the
> question. Should I expect this data load to work reasonably well on a
> single node cluster or not?
>   

I don't know about 'reasonably well'.  Single-node is sub-optimal but it 
should be possible to load it w/ a decent amount of data w/o failures.
> To stack's points:
>
> 4) Could you explain what you mean by "forever to load"? During the
> phases it was working I would get about 100 rows per second, which was
> sufficient for me. Also could you explain why setting up a mapreduce job
> would make things more efficient in a single server setup? Are things
> not limited by disk access either way?
>   
Pardon me.  I presumed multiple cores and was suggesting MR as one means 
of putting up multiple concurrent upload clients.  Yeah, disk is a 
bottleneck.


> 5) When a regionserver judges itself overloaded and blocks updates, can
> another regionserver take up the load for all susequent updates, or do
> certain updates (based on row key presumably) have to go to that
> regionserver?
>   

The latter.

St.Ack