hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vijaya Narayana Reddy Bhoomi Reddy <vijay.bhoomire...@gmail.com>
Subject Re: HDFS File Writes & Reads
Date Fri, 20 Jun 2014 05:54:41 GMT
Yong,

Thanks for the clarification. It was more of an academic query. We do not
have any performance requirements at this stage.

Regards
Vijay


On 19 June 2014 19:05, java8964 <java8964@hotmail.com> wrote:

> What your understanding is almost correct, but not with the part your
> highlighted.
>
> The HDFS is not designed for write performance, but the client doesn't
> have to wait for the acknowledgment of previous packets before sending the
> next packets.
>
> This webpage describes it clearly, and hope it is helpful for you.
>
> http://aosabook.org/en/hdfs.html
>
> Quoted
>
> The next packet can be pushed to the pipeline before receiving the
> acknowledgment for the previous packets. The number of outstanding packets
> is limited by the outstanding packets window size of the client.
>
> Do you have any requirements of performance of ingesting data into HDFS?
>
> Yong
>
> ------------------------------
> Date: Thu, 19 Jun 2014 11:51:43 +0530
> Subject: Re: HDFS File Writes & Reads
> From: vijay.bhoomireddy@gmail.com
> To: user@hadoop.apache.org
>
>
> @Zeshen Wu,Thanks for the response.
>
> I still don't understand how HDFS reduces the time to write and read a
> file, compared to a traditional file read / write mechanism.
>
> For example, if I am writing a file, using the default configurations,
> Hadoop internally has to write each block to 3 data nodes. My understanding
> is that for each block, first the client writes the block to the first data
> node in the pipeline which will then inform the second and so on. Once the
> third data node successfully receives the block, it provides an
> acknowledgement back to data node 2 and finally to the client through Data
> node 1. *Only after receiving the acknowledgement for the block, the
> write is considered successful and the client proceeds to write the next
> block.*
>
> If this is the case, then the time taken to write each block is 3 times
> than the normal write due to the replication factor and the write process
> is happening sequentially block after block.
>
> Please correct me if I am wrong in my understanding. Also, the following
> questions below:
>
> 1. My understanding is that File read / write in Hadoop doesn't have any
> parallelism and the best it can perform is same to a traditional file read
> or write + some overhead involved in the distributed communication
> mechanism.
> 2. Parallelism is provided only during the data processing phase via Map
> Reduce, but not during file read / write by a client.
>
> Regards
> Vijay
>
>
>
> On 17 June 2014 19:37, Zesheng Wu <wuzesheng86@gmail.com> wrote:
>
> 1. HDFS doesn't allow parallel write
> 2. HDFS use pipeline to write multiple replicas, so it doesn't take three
> times more time than a traditional file write
> 3. HDFS allow parallel read
>
>
> 2014-06-17 19:17 GMT+08:00 Vijaya Narayana Reddy Bhoomi Reddy <
> vijay.bhoomireddy@gmail.com>:
>
> Hi,
>
> I have a basic question regarding file writes and reads in HDFS. Is the
> file write and read process a sequential activity or executed in parallel?
>
> For example, lets assume that there is a File File1 which constitutes of
> three blocks B1, B2 and B3.
>
> 1. Will the write process write B2 only after B1 is complete and B3 only
> after B2 is complete or for a large file with many blocks, can this happen
> in parallel? In all the hadoop documentation, I read this to be a
> sequential operation. Does that mean for a file of 1TB, it takes three
> times more time than a traditional file write? (due to default replication
> factor of 3)
> 2. Is it similar in the case of read as well?
>
> Kindly someone please provide some clarity on this...
>
> Regards
> Vijay
>
>
>
>
> --
> Best Wishes!
>
> Yours, Zesheng
>
>
>

Mime
View raw message