hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron Kimball ...@cs.washington.edu>
Subject Re: HDFS File Read
Date Sun, 18 Nov 2007 00:45:56 GMT
You could write the file out under a dummy name and then rename it to 
the target filename after the write is complete. The reader simply 
blocks until the correct filename exists.

- Aaron

j2eeiscool wrote:
> Hi Raghu,
> 
> I understand that.
> 
> I have also read that there is something in the works which will address
> some of this (Reader able to get data before Writer is completely done:
> HADOOP-1700).
> 
> 
> In my test the Writer and Reader are different threads (they could be even
> different processes).
> 
> So how does the Reader know that the Writer is done writing the data (my
> requirement is that the Reader grab the data asap)?
> 
> 1.Previously I was relying on the Reader NOT gettting the Exception
> (07/11/17 11:07:13 INFO fs.DFSClient: Could not obtain block
> blk_3484370064020998905 from any node:  java.io.IOException: No live nodes
> contain current block) as a starting point for the Reader.
> 
> 2.Now I have added the following check on the Reader side:
> 
> 		DistributedFileSystem fileSystem = new DistributedFileSystem();
> 		fileSystem.initialize(uri, conf);
> 		Path path = new Path(sKey);
> 		while (!fileSystem.exists(path)) {
> 			try {
> 				Thread.sleep(30);
> 			} catch (InterruptedException e) {
> 				// TODO Auto-generated catch block
> 				e.printStackTrace();
> 			}
> 		}
> 
> But still get this exception from time to time:
> 
> 07/11/17 11:07:10 INFO fs.DFSClient: Could not obtain block
> blk_8590062477849775138 from any node:  java.io.IOException: No live nodes
> contain current block
> 07/11/17 11:07:10 WARN fs.DFSClient: DFS Read: java.io.IOException:
> Blocklist for /hadoopdata0.txt has changed!
>         at
> org.apache.hadoop.dfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:871)
>         at
> org.apache.hadoop.dfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:1161)
>         at
> org.apache.hadoop.dfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1004)
>         at
> org.apache.hadoop.dfs.DFSClient$DFSInputStream.read(DFSClient.java:1107)
>         at java.io.DataInputStream.read(DataInputStream.java:80)
>         at HadoopDSMStore$ReaderThread.run(HadoopDSMStore.java:206)
> 
> java.io.IOException: Blocklist for /hadoopdata0.txt has changed!
>         at
> org.apache.hadoop.dfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:871)
>         at
> org.apache.hadoop.dfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:1161)
>         at
> org.apache.hadoop.dfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1004)
>         at
> org.apache.hadoop.dfs.DFSClient$DFSInputStream.read(DFSClient.java:1107)
>         at java.io.DataInputStream.read(DataInputStream.java:80)
>         at HadoopDSMStore$ReaderThread.run(HadoopDSMStore.java:206)
> 07/11/17 11:07:10 INFO fs.DFSClient: Could not obtain block
> blk_3484370064020998905 from any node:  java.io.IOException: No live nodes
> contain current block
> 
> 
> I could build an explicit hand-off from the Writer to Reader but that would
> be tricky for inter processes.
> 
> Any ideas.
> 
> Thanx,
> Taj
> 
> 
> 
> Raghu Angadi wrote:
>> Taj,
>>
>> I don't know what you are trying to do but simultaneous write and read 
>> won't work on any filesystem (unless reader is more complicated that 
>> what you had).
>>
>> For now, I think you will get most predictable behaviour if you read 
>> after writer has closed the file.
>>
>> Raghu.
>>
>> j2eeiscool wrote:
>>> Hi Dhruba,
>>>
>>> For my test I do have a Reader and Writer thread. The Reader blocks till
>>> the
>>> InputStream is available:
>>>
>>> The Reader gets the following exception till the Writer is done :
>>>
>>> org.apache.hadoop.ipc.RemoteException: java.io.IOException: Cannot open
>>> filename /hadoopdata0.txt
>>>         at org.apache.hadoop.dfs.NameNode.open(NameNode.java:269)
>>>         at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
>>>         at
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>         at java.lang.reflect.Method.invoke(Method.java:597)
>>>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:340)
>>>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:566)
>>>
>>>         at org.apache.hadoop.ipc.Client.call(Client.java:470)
>>>         at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:165)
>>>         at org.apache.hadoop.dfs.$Proxy0.open(Unknown Source)
>>>         at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
>>>         at
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>         at java.lang.reflect.Method.invoke(Method.java:585)
>>>         at
>>> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
>>>         at
>>> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
>>>         at org.apache.hadoop.dfs.$Proxy0.open(Unknown Source)
>>>         at
>>> org.apache.hadoop.dfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:864)
>>>         at
>>> org.apache.hadoop.dfs.DFSClient$DFSInputStream.<init>(DFSClient.java:856)
>>>         at org.apache.hadoop.dfs.DFSClient.open(DFSClient.java:277)
>>>         at
>>> org.apache.hadoop.dfs.DistributedFileSystem.open(DistributedFileSystem.java:122)
>>>         at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:244)
>>>         at HadoopDSMStore.select(HadoopDSMStore.java:44)
>>>         at HadoopDSMStore$ReaderThread.run(HadoopDSMStore.java:174)
>>> org.apache.hadoop.ipc.RemoteException: java.io.IOException: Cannot open
>>> filename /hadoopdata0.txt
>>>         at org.apache.hadoop.dfs.NameNode.open(NameNode.java:269)
>>>         at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
>>>         at
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>         at java.lang.reflect.Method.invoke(Method.java:597)
>>>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:340)
>>>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:566)
>>>
>>>         at HadoopDSMStore.select(HadoopDSMStore.java:44)
>>>         at HadoopDSMStore$ReaderThread.run(HadoopDSMStore.java:174)
>>>
>>>
>>> 1.Is there an api (like isFileAvailable(fileName)) the Reader needs to
>>> check
>>> before starting ?
>>>
>>> 2.Should there be a delay between Writer end and Reader start ?
>>>
>>> Thanx,
>>> Taj
>>
>>
>>
> 

Mime
View raw message