hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hiller, Dean (Contractor)" <dean.hil...@broadridge.com>
Subject upload 1.8 gig file turns into 13 gig(no replication)
Date Wed, 29 Dec 2010 21:34:42 GMT
I have dfs.replication set to 1, and have a 1.8 gig file on the hdfs and
after my map reduct which just pretty much puts each row in the file to
a row in the database, I end up with a 14.8 gigs of usage-1.8 = 13 gigs
used by hbase???


I think this is starting to seem normal maybe now after thinking about
it a bit.  Here is the details though just in case...


My 10 million rows each have a 

Key=<accountNo>-<UUID>  //ok, this UUID is extra space too that I eat up


And my other code just comes from the file....


                  Put put = new Put(key.getBytes());


//all below is from file

                  add(put, "attributes_family", "accountNo", values[0]);

                  add(put, "attributes_family", "activityId",
values[1]); //int

                  add(put, "attributes_family", "random", values[2]);

                  add(put, "attributes_family", "line", values[3]);
//long string

                  add(put, "attributes_family", "something", values[4]);
//long string


In RDBMS, I was not taking any space with column names, but that now
takes up space, right?  And my UUID is not in the file and also adds
some space as well.  Does this sound about right to people?  (I have no
idea what the size would look like if I read that into an RDBMS(and of
course indexing, etc. can play a role too).




This message and any attachments are intended only for the use of the addressee and
may contain information that is privileged and confidential. If the reader of the 
message is not the intended recipient or an authorized representative of the
intended recipient, you are hereby notified that any dissemination of this
communication is strictly prohibited. If you have received this communication in
error, please notify us immediately by e-mail and delete the message and any
attachments from your system.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message