phoenix-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Fustes, Diego" <>
Subject Bulk load for binay file formats
Date Thu, 07 Jan 2016 10:55:13 GMT
Hi all,

In our project we need to ingest big amounts of data (1TB stored in custom binary files) to
HBase using Phoenix. To do so, at the moment, we are converting the binary files to CSV and
using the bulk load tool included in Phoenix. Unfortunately, such process takes too long given
that we need to store big files in HDFS (10TB in CSV), and then run the MapReduce job to convert
these files to HFiles.

I think that it should be considerably faster and compact to use another file format (For
example Avro) as intermediate storage for bulk loading. Could this be implemented in the next
releases of Phoenix?

Another possibility is that we create the HFiles directly in our code. How complex would that

With kind regards,


[Description: Description: cid:image001.png@01CF4378.72EDFE50]
Diego Fustes, Big Data and Machine Learning Expert
Gran Vía de les Corts Catalanes 130, 11th floor
08038 Barcelona, Spain
Phone: +34 93 43 255 27<><>

This email is intended only for the recipient(s) designated above.  Any dissemination, distribution,
copying, or use of the information contained herein by anyone other than the recipient(s)
designated by the sender is unauthorized and strictly prohibited and subject to legal privilege.
 If you have received this e-mail in error, please notify the sender immediately and delete
and destroy this email.

Der Inhalt dieser E-Mail und deren Anhänge sind vertraulich. Wenn Sie nicht der Adressat
sind, informieren Sie bitte den Absender unverzüglich, verwenden Sie den Inhalt nicht und
löschen Sie die E-Mail sofort.

NDT Global GmbH and Co. KG,  Friedrich-List-Str. 1, D-76297 Stutensee, Germany
Registry Court Mannheim
HRA 704288

Personally liable partner: 
NDT Verwaltungs GmbH
Friedrich-List-Straße 1, D-76297 Stutensee, Germany
Registry Court Mannheim
HRB 714639
CEO: Gunther Blitz

  • Unnamed multipart/related (inline, None, 0 bytes)
View raw message