hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Raghu Angadi <rang...@yahoo-inc.com>
Subject Re: HDFS File Read
Date Thu, 08 Nov 2007 23:47:28 GMT

Thats too long.. buffer size does not explain it. Only small problem I 
see in your code:

 > totalBytesRead += bytesReadThisRead;
 > fileNotReadFully = (bytesReadThisRead != -1);

totalBytesRead is off by 1. Not sure where totalBytesRead is used.

If you can, try to check tcpdump on your client machine (for datanode 
port 50010)

Raghu.

j2eeiscool wrote:
> Hi Raghu,
> 
> Many thanx for your reply:
> 
> The write takes approximately:  11367 millisecs.
> 
> The read takes approximately: 1610565 millisecs.
> 
> File size is  68573254 bytes and hdfs block size is 64 megs.
> 
> 
> Here is the  Writer code:
> 
> 			FileInputStream fis = null;
> 			OutputStream os = null;
> 			try {
> 		        fis = new FileInputStream(new File(inputFile));
> 		        os = dsmStore.insert(outputFile);
> 
> 
> 
> dsmStore.insert does the following:
> {
> 
> 		DistributedFileSystem fileSystem = new DistributedFileSystem();
> 		fileSystem.initialize(uri, conf);
>         Path path = new Path(sKey);
>         //writing:
>         FSDataOutputStream dataOutputStream = fileSystem.create(path);
>         
>         return dataOutputStream;
> 
> }		        
> 
> 
> 		        byte[] data = new byte[4096];
> 		        while (fis.read(data) != -1) {
> 		        	os.write(data);
> 		        	os.flush();
> 		        }
> 			} catch (Exception e) {
> 				e.printStackTrace();
> 			}
> 			finally {
> 				if (os != null) {
> 			        try {
> 						os.close();
> 					} catch (IOException e) {
> 						// TODO Auto-generated catch block
> 						e.printStackTrace();
> 					}					
> 				}
> 
> 				if (fis != null) {
> 			        try {
> 						fis.close();
> 					} catch (IOException e) {
> 						// TODO Auto-generated catch block
> 						e.printStackTrace();
> 					}					
> 				}
> 				
> 				
> 			}
> 		}
> 
> 
> Here is the  Reader code:
> 
> 
> 	        byte[] data = new byte[4096];
> 	        int totalBytesRead = 0;
> 	        boolean fileNotReadFully = true;
> 	        InputStream is = dsmStore.select(fileName);
> 
> 
> dsmStore.select does the following:
> {
> 
> 		DistributedFileSystem fileSystem = new DistributedFileSystem();
> 		fileSystem.initialize(uri, conf);
> 		Path path = new Path(sKey);
>         FSDataInputStream dataInputStream = fileSystem.open(path);
> 
>         return dataInputStream;
> 
> }		        
> 
> 
> 	        
> 			while (fileNotReadFully) {
>                                 int bytesReadThisRead = 0 ;
> 				try {
> 					bytesReadThisRead = is.read(data);
> 					totalBytesRead += bytesReadThisRead;
> 					fileNotReadFully = (bytesReadThisRead != -1);
> 				} catch (Exception e) {
> 					e.printStackTrace();
> 				}
> 			}
> 			if (is != null) {
> 				try {
> 					is.close();
> 				} catch (IOException e) {
> 					// TODO Auto-generated catch block
> 					e.printStackTrace();
> 				}
> 			}
> 
> 
> Could probably try different buffer sizes etc.
> 
> Thanx,
> Taj
> 
> 
> Raghu Angadi wrote:
>>
>> How slow is it? May the code that reads is relevant too.
>>
>> Raghu.
>>
>> j2eeiscool wrote:
>>> Hi,
>>>
>>> I am new to hadoop. We are evaluating HDFS for a reliable, disitrbuted
>>> file
>>> system use.
>>>
>>> From the tests (1 name + 1 data, both on different RHEL 4 m/cs, client
>>> running on the name node m/c) I have run so far:
>>>
>>> 1.The writes are very fast.
>>>
>>> 2.The read is very slow (reading a 68 megs file). Here is the sample
>>> code.
>>> Any ideas what could be going wrong:
>>>
>>>
>>> 	public InputStream select(String sKey) throws RecordNotFoundException,
>>> IOException {
>>> 		DistributedFileSystem fileSystem = new DistributedFileSystem();
>>> 		fileSystem.initialize(uri, conf);
>>> 		Path path = new Path(sKey);
>>>         FSDataInputStream dataInputStream = fileSystem.open(path);
>>>         return dataInputStream;
>>>
>>> 	}
>>>
>>> Thanx,
>>> Taj
>>>
>>>
>>
>>
> 


Mime
View raw message