hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Suhail Doshi <digitalwarf...@gmail.com>
Subject Re: LOAD DATA question
Date Sun, 22 Mar 2009 23:24:29 GMT
Zheng,

Do you know if hive may have problems going through *lots* of log files
(each 1 MB large). I remember reading about how hadoop sometimes has
problems dealing with lots of small files due to the default block size it
reads.

Suhail

On Sun, Mar 22, 2009 at 4:16 PM, Zheng Shao <zshao9@gmail.com> wrote:

> For now, please append the unix timestamp to the end of the file name.
>
> Zheng
>
>
> On Sun, Mar 22, 2009 at 12:35 PM, Suhail Doshi <suhail@mixpanel.com>wrote:
>
>> Hi there,
>>
>> I was reading some of the documentation and I came across this statement:
>> "Note that if the target table (or partition) already has a file whose name
>> collides with any of the filenames contained in *filepath* - then the
>> existing file will be replaced with the new file."
>>
>> I have rotating data logs that start at log.1 and go to log.512 and wrap
>> around back to log.1, does this mean that when I try to LOAD DATA log.1
>> again it's going to overwrite the other one?
>>
>> In normal MySQL, this data is just constantly appended regardless of the
>> file name, but given how it's likely the file is being loaded in hdfs this
>> probably is different. If what I am thinking is happening, what is the
>> solution for rotating log files?
>>
>> Thanks,
>> Suhail
>>
>
>
>
> --
> Yours,
> Zheng
>



-- 
http://mixpanel.com
Blog: http://blog.mixpanel.com

Mime
View raw message