hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: Best format to use
Date Tue, 09 Apr 2013 17:25:37 GMT
Pig and Hive both have support for compressed sequence files.

Regarding best format - if its just text log data (i.e. no
types/structure) then the best format to keep it in is in
text+compress. SequenceFiles help make it splittable but add a small
overhead in space and efficiency and none of the good codecs out there
are splittable on their own for compression (LZO is good, but needs
pre-indexing to be viewed splittable).

On Tue, Apr 9, 2013 at 10:21 PM, Mark <static.void.dev@gmail.com> wrote:
> Actually, compressed sequence files may not work with Pig or Hive then right?
> On Apr 9, 2013, at 9:50 AM, Mark <static.void.dev@gmail.com> wrote:
>> Forgetting Impala, what format would be best to use with daily logs?
>> Block-compressed sequence files?
>> On Apr 8, 2013, at 8:12 PM, Harsh J <harsh@cloudera.com> wrote:
>>> Hey Mark,
>>> Gzip codec creates extension .gzip, not .deflate (which is
>>> DeflateCodec). You may want to re-check your settings.
>>> Impala questions are best resolved at its current user and developer
>>> community at https://groups.google.com/a/cloudera.org/forum/#!forum/impala-user.
>>> Impala does currently support LZO (and also Indexed LZO) compressed
>>> text files however, so you may want to try that as its splittable
>>> (compared to Gzip ones).
>>> On Tue, Apr 9, 2013 at 5:18 AM, Mark <static.void.dev@gmail.com> wrote:
>>>> Trying to determine what the best format to use for storing daily logs. We
recently switch from snappy (.snappy) to gzip (.deflate) but I'm wondering if there is something
better? Our main clients for these daily logs are pig and hive using an external table. We
were thinking about testing out impala but we see that it doesn't work with compressed text
files. Any suggestions?
>>>> Thanks
>>> --
>>> Harsh J

Harsh J

View raw message