hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Laurent Laborde <kerdez...@gmail.com>
Subject Re: tons of bugs and problem found
Date Tue, 01 Feb 2011 20:54:32 GMT
thank you for your replies.
i reinstalled hadoop and hive, switched from Cloudera CDH3 to CDH2,
restarted everything from scratch
i've set io.skip.checksum.errors=true

and i still have the same error :(

what's wrong ? :(
the dataset come from a postgresql database and is consistant.


On Tue, Feb 1, 2011 at 6:57 AM, Aaron Kimball <akimball83@gmail.com> wrote:
> In MapReduce, filenames that begin with an underscore are "hidden" files and
> are not enumerated by FileInputFormat (Hive, I believe, processes tables
> with TextInputFormat and SequenceFileInputFormat, both descendants of this
> class).
> Using "_foo" as a hidden/ignored filename is conventional in the Hadoop
> world. This is different than the UNIX convention of using ".foo", but
> that's software engineering for you. ;)
> This is unlikely to change soon; MapReduce emits files with names like
> "_SUCCESS" into directories to indicate successful job completion.
> Directories such as "_tmp" and "_logs" also appear in datasets, and are
> therefore ignored as input by MapReduce-based tools, but those metadata
> names are established in other projects.
> If you run 'hadoop fs -mv /path/to/_top.sql /path/to/top.sql', that should
> make things work for you.
> - Aaron
>
> On Mon, Jan 31, 2011 at 10:21 AM, yongqiang he <heyongqiangict@gmail.com>
> wrote:
>>
>> You can first try to set io.skip.checksum.errors to true, which will
>> ignore bad checksum.
>>
>> >>In facebook, we also had a requirement to ignore corrupt/bad data - but
>> >> it has not been committed yet. Yongqiang, what is the jira number ?
>> there seems no jira for this issue.
>>
>> thanks
>> yongqiang
>> 2011/1/31 Namit Jain <njain@fb.com>:
>> >
>> >
>> > On 1/31/11 7:46 AM, "Laurent Laborde" <kerdezixe@gmail.com> wrote:
>> >
>> >>On Fri, Jan 28, 2011 at 8:05 AM, Laurent Laborde <kerdezixe@gmail.com>
>> >>wrote:
>> >>> On Fri, Jan 28, 2011 at 1:12 AM, Namit Jain <njain@fb.com> wrote:
>> >>>> Hi Laurent,
>> >>>>
>> >>>> 1. Are you saying that _top.sql did not exist in the home directory.
>> >>>> Or that, _top.sql existed, but hive was not able to read it after
>> >>>>loading
>> >>>
>> >>> It exist, it's loaded, and i can see it in the hive's warehouse
>> >>>directory.
>> >>> it's just impossible to query it.
>> >>>
>> >>>> 2. I donĀ¹t think reserved words are documented somewhere. Can you
>> >>>> file
>> >>>>a
>> >>>> jira for this ?
>> >>>
>> >>> Ok; will do that today.
>> >>>
>> >>>> 3. The bad row is printed in the task log.
>> >>>>
>> >>>> 1. 2011-01-27 11:11:07,046 INFO org.apache.hadoop.fs.FSInputChecker:
>> >>>>Found
>> >>>> checksum error: b[1024,
>> >>>>
>>
>> >>>> >>>>1536]=7374796c653d22666f6e742d73697a653a20313270743b223e3c623e266e627370
>> >>>>3b2
>> >>>>
>>
>> >>>> >>>>66e6273703b266e6273703b202a202838302920416d69656e733a3c2f623e3c2f7370616
>> >>>>e3e
>> >>>>
>>
>> >>>> >>>>3c2f7370616e3e5c6e20203c2f703e5c6e20203c703e5c6e202020203c7370616e207374
>> >>>>796
>> >>>>
>>
>> >>>> >>>>c653d22666f66742d66616d696c793a2068656c7665746963613b223e3c7370616e20737
>> >>>>479
>> >>>>
>>
>> >>>> >>>>6c653d22666f6e742d73697a653a20313270743b223e3c623e266e6273703b266e627370
>> >>>>3b2
>> >>>>
>>
>> >>>> >>>>66e6273703b266e6273703b266e6273703b266e6273703b266e6273703b266e6273703b2
>> >>>>66e
>> >>>>
>>
>> >>>> >>>>6273703b206f203132682c2050697175652d6e6971756520646576616e74206c65205265
>> >>>>637
>> >>>>
>>
>> >>>> >>>>46f7261742e3c2f623e3c2f7370616e3e3c2f7370616e3e5c6e20203c2f703e5c6e20203
>> >>>>c70
>> >>>>
>>
>> >>>> >>>>3e5c6e202020203c7370616e207374796c653d22666f6e742d66616d696c793a2068656c
>> >>>>766
>> >>>>
>>
>> >>>> >>>>5746963613b223e3c7370616e207374796c653d22666f6e742d73697a653a20313270743
>> >>>>b22
>> >>>>
>>
>> >>>> >>>>3e3c623e266e6273703b266e6273703b266e6273703b266e6273703b266e6273703b266e
>> >>>>627
>> >>>>
>>
>> >>>> >>>>3703b266e6273703b266e6273703b266e6273703b206f2031346833302c204d6169736f6
>> >>>>e20
>> >>>>
>>
>> >>>> >>>>6465206c612063756c747572652e3c2f623e3c2f7370616e3e3c2f7370616e3e5c6e2020
>> >>>>3c2
>> >>>> f703e5c6e20203c703e5c6e202020203c7370616e207374796c653d
>> >>>
>> >>> Is this the actual data ?
>> >>>
>> >>>> 2. org.apache.hadoop.fs.ChecksumException: Checksum error:
>> >>>> /blk_2466764552666222475:of:/user/hive/warehouse/article/article.copy
>> >>>>at
>> >>>> 23446528
>> >>>
>> >>> 23446528 is the line number ?
>> >>>
>> >>> thank you
>> >>
>> >>optional question (the previous ones are still open) :
>> >>is there a way to tell hive to ignore invalid data ? (if the problem
>> >>is invalid data)
>> >>
>> >
>> > Currently, not.
>> > In facebook, we also had a requirement to ignore corrupt/bad data - but
>> > it
>> > has not
>> > been committed yet. Yongqiang, what is the jira number ?
>> >
>> >
>> > Thanks,
>> > -namit
>> >
>> >
>> >>
>> >>--
>> >>Laurent "ker2x" Laborde
>> >>Sysadmin & DBA at http://www.over-blog.com/
>> >
>> >
>
>



-- 
Laurent "ker2x" Laborde
Sysadmin & DBA at http://www.over-blog.com/

Mime
View raw message