hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pallavi Palleti <pallavi.pall...@corp.aol.com>
Subject Re: File is closed but data is not visible
Date Thu, 13 Aug 2009 04:15:51 GMT
yes.
----- Original Message -----
From: "Raghu Angadi" <rangadi@yahoo-inc.com>
To: common-user@hadoop.apache.org
Sent: Wednesday, August 12, 2009 10:09:55 PM GMT +05:30 Chennai, Kolkata, Mumbai, New Delhi
Subject: Re: File is closed but data is not visible


What happens when the while loop ends? Is 'out' closed then?

Palleti, Pallavi wrote:
> No. I am closing it before opening a new one
> 
> if (out != null) // if any output stream opened previously ,  close it
>           {
>             logger.info("Closing writer of -" + 
>  paramWrapper.getOutFileStr());
>             out.close();
>             out = null;
>           }
> 
> Thanks
> Pallavi
> 
> -----Original Message-----
> From: Jason Venner [mailto:jason.hadoop@gmail.com] 
> Sent: Wednesday, August 12, 2009 7:31 PM
> To: common-user@hadoop.apache.org
> Subject: Re: File is closed but data is not visible
> 
> You do not appear to close out, except when an exception occurs.
> The finally block only closes the reader.
> 
> On Wed, Aug 12, 2009 at 6:24 AM, Palleti, Pallavi <
> pallavi.palleti@corp.aol.com> wrote:
> 
>> Hi Jason,
>>
>> Kindly find the snippet of code which creates and close file.
>>
>> Variables passed to the method:FSDataOutputStream out,ParamWrapper
>> paramWrapper
>>
>> Snippet:
>>
>>    String inputLine = null;
>>    int status = 0;
>>
>>    BufferedReader reader = null;
>>
>>    try {
>>      reader = new BufferedReader(new InputStreamReader(....); //reader
>> initialization
>>
>>      while ((inputLine = reader.readLine()) != null) {
>>
>>        Date date = getLoggedDate(inputLine); // process the line to
> get
>> input and if it is wrong
>>        if (date == null) // if input data is wrong, don't write
>>        {
>>          continue;
>>        }
>>        Calendar cal = Calendar.getInstance();
>>        cal.setTime(date);
>>        int hour = cal.get(Calendar.HOUR_OF_DAY); // get input hour
>>        int minutes = cal.get(Calendar.MINUTE); // get input minute
>>
>>        int outputMinute = minutes / timePeriod + 1; // compute the
> slot
>>        if (paramWrapper.prevHour != hour
>>            || paramWrapper.prevMin != outputMinute) // if it is a new
>> slot
>>        {
>>
>>          if (out != null) // if any output stream opened previously ,
>> close it
>>          {
>>            logger.info("Closing writer of -" +
>> paramWrapper.getOutFileStr());
>>            out.close();
>>            out = null;
>>          }
>>          String outFileStr = generateFileName(rootDir,
>> hdfsOutFile,outputMinute, date); // generate file name ex:
>> location/year/month/day/hour/_1.txt
>>          Path outFile = new Path(outFileStr);
>>
>>          paramWrapper.setOutFileStr(outFileStr);
>>          logger.info("Creating outFile:" + outFileStr);
>>
>>          out = fs.create(outFile); // create new file and get
>>          // output stream
>>          paramWrapper.setPrevHour(hour);
>>          paramWrapper.setPrevMin(outputMinute);
>>        }
>>        StringBuilder outLineStr = new StringBuilder();
>>        outLineStr.append(inputLine).append("\n");
>>        out.write(outLineStr.toString().getBytes());
>>      }
>>    } catch (IOException ioe) {
>>      logger.error("Main: IO Exception while writing to HDFS,
> exiting...
>> ", ioe);
>>      // before exiting do the cleanup
>>      close(reader);
>>      System.exit(-1);
>>
>>    } catch (Exception e) {
>>      logger.error("Unexpected error while writing to HDFS, exiting
>> ...", e);
>>      // before exiting do the cleanup
>>      close(reader);
>>
>>      System.exit(-1);
>>    } finally {
>>      close(reader);
>>     }
>>
>> Thanks
>> Pallavi
>>
>>
>> -----Original Message-----
>> From: Jason Venner [mailto:jason.hadoop@gmail.com]
>> Sent: Wednesday, August 12, 2009 6:35 PM
>> To: common-user@hadoop.apache.org
>> Subject: Re: File is closed but data is not visible
>>
>> Are you explicitly calling close on the FSDataOutputStream that you
>> received
>> from the FileSystem.create method?
>> It sounds like the close is actually happening in the finalizer method
>> on
>> the object.
>>
>> Can you post the relevant code, or provide a cut down demonstrator?
>>
>> On Wed, Aug 12, 2009 at 5:57 AM, Palleti, Pallavi <
>> pallavi.palleti@corp.aol.com> wrote:
>>
>>> Hi Jason,
>>>
>>> The file is neither visible via Namenode UI nor via program(checking
>>> whether a file exists).
>>>
>>> There is no caching happening at the application level. The
>> application
>>> is pretty simple. We are taking apache logs and trying to put into
>>> timely buckets based on the logged time of records. We are creating
> 4
>>> files(one for every 15 minutes) for every hour. So, at the client
>> side,
>>> we are looking into the logs and see if the data belongs to the
>> current
>>> interval, then we are writing into the currently opened HDFS file.
> If
>> it
>>> belongs to new interval, the old file is closed and new file is
>> created.
>>> I have been logging the time at which the file is being created and
> at
>>> which the file is being closed at my client side. And, I could see
>> that
>>> the file is getting closed at expected time period. But, when I look
>> for
>>> the same file in hadoop cluster, it is still not created and if I
> wait
>>> for another 1 to 2 hours, I could see the file.
>>>
>>> Thanks
>>> Pallavi
>>>
>>>
>>> -----Original Message-----
>>> From: Jason Venner [mailto:jason.hadoop@gmail.com]
>>> Sent: Wednesday, August 12, 2009 6:03 PM
>>> To: common-user@hadoop.apache.org
>>> Subject: Re: File is closed but data is not visible
>>>
>>> Is it possible that your application is caching some data and not
>>> refreshing
>>> it when you expect?
>>> The HDFS file visibility semantics are well understood, and your
> case
>>> does
>>> not fit with that understanding.
>>> A factor that hints strongly at this is that your file is visible
> via
>>> the
>>> Namenode UI, there is nothing special about that UI
>>>
>>> On Tue, Aug 11, 2009 at 9:00 PM, Pallavi Palleti <
>>> pallavi.palleti@corp.aol.com> wrote:
>>>
>>>> Hi Raghu,
>>>>
>>>> The file doesn't appear in the cluster when I saw it from Namenode
>> UI.
>>>> Also, I have a monitor at cluster side which checks whether file
> is
>>> created
>>>> and throws an exception when it is not created. And, it threw an
>>> exception
>>>> saying "File not found".
>>>>
>>>> Thanks
>>>> Pallavi
>>>> ----- Original Message -----
>>>> From: "Raghu Angadi" <rangadi@yahoo-inc.com>
>>>> To: common-user@hadoop.apache.org
>>>> Sent: Wednesday, August 12, 2009 12:10:12 AM GMT +05:30 Chennai,
>>> Kolkata,
>>>> Mumbai, New Delhi
>>>> Subject: Re: File is closed but data is not visible
>>>>
>>>>
>>>> Your assumption is correct. When you close the file, others can
> read
>>> the
>>>> data. There is no delay expected before the data is visible. If
>> there
>>> is
>>>> an error either write() or close() would throw an error.
>>>>
>>>> When you say data is not visible do you mean readers can not see
> the
>>>> file or can not see the data? Is it guaranteed that readers open
> the
>>>> file _after_ close returns on the writer?
>>>>
>>>> Raghu.
>>>>
>>>> Palleti, Pallavi wrote:
>>>>> Hi Jason,
>>>>>
>>>>> Apologies for missing version information in my previous mail. I
>> am
>>>>> using hadoop-0.18.3. I am getting FSDataOutputStream object
> using
>>>>> fs.create(new Path(some_file_name)), where fs is FileSystem
>> object.
>>> And,
>>>>> I am closing the file using close().
>>>>>
>>>>> Thanks
>>>>> Pallavi
>>>>>
>>>>> -----Original Message-----
>>>>> From: Jason Venner [mailto:jason.hadoop@gmail.com]
>>>>> Sent: Tuesday, August 11, 2009 6:24 PM
>>>>> To: common-user@hadoop.apache.org
>>>>> Subject: Re: File is closed but data is not visible
>>>>>
>>>>> Please provide information on what version of hadoop you are
> using
>>> and
>>>>> the
>>>>> method of opening and closing the file.
>>>>>
>>>>>
>>>>> On Tue, Aug 11, 2009 at 12:48 AM, Pallavi Palleti <
>>>>> pallavi.palleti@corp.aol.com> wrote:
>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> We have an application where we pull logs from an external
>>> server(far
>>>>> apart
>>>>>> from hadoop cluster) to hadoop cluster. Sometimes, we could see
>>> huge
>>>>> delay
>>>>>> (of 1 hour or more) in actually seeing the data in HDFS though
>> the
>>>>> file has
>>>>>> been closed and the variable is set to null from the external
>>>>> application.I
>>>>>> was in the impression that when I close the file, the data gets
>>>>> reflected in
>>>>>> hadoop cluster. Now, in this situation, it is even more
>> complicated
>>> to
>>>>>> handle write failures as it is giving false impression to the
>>> client
>>>>> that
>>>>>> data has been written to HDFS. Kindly clarify if my perception
> is
>>>>> correct.
>>>>>> If yes, Could some one tell me what is causing the delay in
>>> actually
>>>>> showing
>>>>>> the data. During those cases, how can we tackle write failures
>> (due
>>> to
>>>>> some
>>>>>> temporary issues like data node not available, disk is full) as
>>> there
>>>>> is no
>>>>>> way, we can figure out the failure at the client side?
>>>>>>
>>>>>> Thanks
>>>>>> Pallavi
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>> --
>>> Pro Hadoop, a book to guide you from beginner to hadoop mastery,
>>> http://www.amazon.com/dp/1430219424?tag=jewlerymall
>>> www.prohadoopbook.com a community for Hadoop Professionals
>>>
>>
>>
>> --
>> Pro Hadoop, a book to guide you from beginner to hadoop mastery,
>> http://www.amazon.com/dp/1430219424?tag=jewlerymall
>> www.prohadoopbook.com a community for Hadoop Professionals
>>
> 
> 
> 


Mime
View raw message