hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chen Wang <>
Subject Help on loading data stream to hive table.
Date Thu, 02 Jan 2014 19:58:31 GMT
I am using storm to read data stream from our socket server, entry by
entry, and then write them to file: one entry per file.  At some point, i
need to import the data into my hive table. There are several approaches i
could think of:
1. directly write to hive hdfs file whenever I get the entry(from our
socket server). The problem is that this could be very inefficient,  since
we have huge amount of data stream, and I would not want to write to hive
hdfs one by one.
2 i can write the entries to files(normal file or hdfs file) on the disk,
and then have a separate job to merge those small files into big one, and
then load them into hive table.
The problem with this is, a) how can I merge small files into big files for
hive? b) what is the best file size to upload to hive?

I am seeking advice on both approaches, and appreciate your insight.

View raw message