hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "tim robertson" <timrobertson...@gmail.com>
Subject Question to speaker (tab file loading) at yesterdays user group
Date Thu, 15 Jan 2009 07:30:20 GMT
Hi all,

I was skyping in yesterday from Europe.
Being half asleep and on a bad wireless, it was not too easy to hear
sometimes, and I have some quick questions to the person who was
describing his tab file (CSV?) loading at the beginning.

Could you please summarise quickly again the stats you mentioned?
Number rows, size file size pre loading, was it 7 Strings? per row,
size after load, time to load etc

Also, could you please quickly summarise your cluster hardware (spec,
ram + number nodes)?

What did you find sped it up?

How many columns per family were you using and did this affect much
(presumably less mean fewer region splits right?)

The reason I ask is I have around 50G in tab file (representing 162M
rows from mysql with around 50 fields - strings of <20 chars and int
mostly) and will be loading HBase with this.  Once this initial import
is done, I will then harvest XML and Tab files into HBase directly
(storing the raw XML record or tab file row as well).
I am in early testing (awaiting hardware and fed up using EC2) so
still running code on laptop and small tests.  I have 6 dell boxes (2
proc, 5G memory, SCSI?) being freed up in 3-4 weeks and wonder what
performance I will get.

Thanks,

Tim

Mime
View raw message