hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chen Wang <>
Subject Re: Help on loading data stream to hive table.
Date Tue, 07 Jan 2014 02:26:00 GMT
the problem is that the data is partitioned by epoch ten hourly, and i want
all data belong to that partition to be written into one file named with
that partition. How can i share the file writer across different bolt?
should I instruct data within the same partition to the same bolt?

On Fri, Jan 3, 2014 at 3:27 PM, Alan Gates <> wrote:

> You shouldn’t need to write each record to a separate file.  Each Storm
> bolt should be able to write to it’s own file, appending records as it
> goes.  As long as you only have one writer per file this should be fine.
>  You can then close the files every 15 minutes (or whatever works for you)
> and have a separate job that creates a new partition in your Hive table
> with the files created by your bolts.
> Alan.
> On Jan 2, 2014, at 11:58 AM, Chen Wang <> wrote:
> > Guys,
> > I am using storm to read data stream from our socket server, entry by
> entry, and then write them to file: one entry per file.  At some point, i
> need to import the data into my hive table. There are several approaches i
> could think of:
> > 1. directly write to hive hdfs file whenever I get the entry(from our
> socket server). The problem is that this could be very inefficient,  since
> we have huge amount of data stream, and I would not want to write to hive
> hdfs one by one.
> > Or
> > 2 i can write the entries to files(normal file or hdfs file) on the
> disk, and then have a separate job to merge those small files into big one,
> and then load them into hive table.
> > The problem with this is, a) how can I merge small files into big files
> for hive? b) what is the best file size to upload to hive?
> >
> > I am seeking advice on both approaches, and appreciate your insight.
> > Thanks,
> > Chen
> >
> --
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.

View raw message