hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vladimir Rodionov <vrodio...@carrieriq.com>
Subject RE: Large Files in Column Qualifier
Date Sat, 21 Sep 2013 16:38:35 GMT
HBase is not  a file storage. It was not designed to be a file storage. Depending on your usage
pattern I would suggest you another approach:

Store your files in a large "upload bundles"  on HDFS. You will need a collector(s) process
for that.  Store references (Upload file name, offset and size)
in HBase.

Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: vrodionov@carrieriq.com

________________________________________
From: Geovanie Marquez [geovanie.marquez@gmail.com]
Sent: Saturday, September 21, 2013 6:05 AM
To: user@hbase.apache.org
Subject: Large Files in Column Qualifier

I am evaluating an HBase design that would require that it rarely house a
1GB file in the column qualifier. Files range from 1GB - 1KB. These files
are raw files being ingested from clients and to be kept for some period of
time (several years) for quality control purposes. The application does not
depend on these files being in HBase, the files would be used by QA
personnel for data forensics to find out why data behaved unexpectedly in
the app or in our QC processes. That being said a lot of the reasons I've
read for not maintaining the data in HBase doesn't apply: compaction
storms, or performance degradation, since we can throttle how we place the
data in here.

I'd like to use HBase because it offers potential for indexing the data
later and potential for total data population analysis over solutions
involving HDFS as well as the use case where we receive tiny KB files more
often than not which would contribute to the Namenodes memory restrictions.
I could HAR these in HDFS but then indexing and more flexible options for
data analysis go out the window.

Does anyone see some glaring oversight I may be making in this design
consideration?

Thanks for your time.

Confidentiality Notice:  The information contained in this message, including any attachments
hereto, may be confidential and is intended to be read only by the individual or entity to
whom this message is addressed. If the reader of this message is not the intended recipient
or an agent or designee of the intended recipient, please note that any review, use, disclosure
or distribution of this message or its attachments, in any form, is strictly prohibited. 
If you have received this message in error, please immediately notify the sender and/or Notifications@carrieriq.com
and delete or destroy any copy of this message and its attachments.

Mime
View raw message