hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Monu Ogbe" <monu.o...@richmondinformatics.com>
Subject Re: Help: -copyFromLocal
Date Thu, 30 Mar 2006 18:35:18 GMT
This is now reported as HADOOP-112 in JIRA. 

----- Original Message ----- 
From: "Eric Baldeschwieler" <eric14@yahoo-inc.com>
To: <hadoop-dev@lucene.apache.org>
Sent: Thursday, March 30, 2006 5:42 AM
Subject: Re: Help: -copyFromLocal


> Interesting.  It would actually be nice to include the CRCs in an  
> export, so that you can validate your data when you reload it.  CRCs  
> are best if they are kept end to end.
> 
> On Mar 29, 2006, at 9:59 AM, Doug Cutting wrote:
> 
>> monu.ogbe@richmondinformatics.com wrote:
>>> However I'm getting the following error:
>>> copyFromLocal: Target /user/root/crawl/crawldb/current/ 
>>> part-00000/.data.crc
>>> already exists
>>
>> Please file a bug report.  The problem is that when copyFromLocal  
>> enumerates local files it should exclude .crc files, but it does  
>> not. This is the listFiles() call on DistributedFileSystem:160.  It  
>> should filter this, excluding files that are  
>> FileSystem.isChecksumFile().
>>
>> BTW, as a workaround, it is safe to first remove all of the .crc  
>> files, but your files will no longer be checksummed as they are  
>> read.  On systems without ECC memory file corruption is not  
>> uncommon, but I have seen very little on clusters that have ECC.
>>
>> Doug
> 
>

Mime
View raw message