hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mich Talebzadeh <mich.talebza...@gmail.com>
Subject Re: Loading into hbase from csv file issue
Date Mon, 03 Oct 2016 15:16:15 GMT
Hi Jean-Marc

I decided to create a composite key *ticker-date* from the csv file

I just did some manipulation on CSV file

export IFS=",";sed -i 1d tsco.csv; cat tsco.csv | while read a b c d e f;
do echo "TSCO-$a,TESCO PLC,TSCO,$a,$b,$c,$d,$e,$f"; done > temp; mv -f temp
tsco.csv

Which basically takes the csv file, tells the shell that field separator
IFS=",", drops the header, reads every field in every line (1,b,c ..),
creates the composite key TSCO-$a, adds the stock name and ticker to the
csv file. The whole process can be automated and parameterised.

Once the csv file is put into HDFS then, I run the following command

$HBASE_HOME/bin/hbase org.apache.hadoop.hbase.mapreduce.ImportTsv
-Dimporttsv.separator=',' -Dimporttsv.columns="HBASE_ROW
_KEY,stock_info:stock,stock_info:ticker,stock_daily:Date,sto
ck_daily:open,stock_daily:high,stock_daily:low,stock_daily:
close,stock_daily:volume" tsco hdfs://rhes564:9000/data/stocks/tsco.csv

The Hbase table is created as below

create 'tsco','stock_info','stock_daily'

and this is the data (2 rows each 2 family and with 8 attributes)

hbase(main):132:0> scan 'tsco', LIMIT => 2
ROW                                                    COLUMN+CELL
 TSCO-1-Apr-08
column=stock_daily:Date, timestamp=1475507091676, value=1-Apr-08
 TSCO-1-Apr-08
column=stock_daily:close, timestamp=1475507091676, value=405.25
 TSCO-1-Apr-08
column=stock_daily:high, timestamp=1475507091676, value=406.75
 TSCO-1-Apr-08
column=stock_daily:low, timestamp=1475507091676, value=379.25
 TSCO-1-Apr-08
column=stock_daily:open, timestamp=1475507091676, value=380.00
 TSCO-1-Apr-08
column=stock_daily:volume, timestamp=1475507091676, value=49664486
 TSCO-1-Apr-08
column=stock_info:stock, timestamp=1475507091676, value=TESCO PLC
 TSCO-1-Apr-08
column=stock_info:ticker, timestamp=1475507091676, value=TSCO

 TSCO-1-Apr-09
column=stock_daily:Date, timestamp=1475507091676, value=1-Apr-09
 TSCO-1-Apr-09
column=stock_daily:close, timestamp=1475507091676, value=333.30
 TSCO-1-Apr-09
column=stock_daily:high, timestamp=1475507091676, value=334.60
 TSCO-1-Apr-09
column=stock_daily:low, timestamp=1475507091676, value=326.50
 TSCO-1-Apr-09
column=stock_daily:open, timestamp=1475507091676, value=331.10
 TSCO-1-Apr-09
column=stock_daily:volume, timestamp=1475507091676, value=24877341
 TSCO-1-Apr-09
column=stock_info:stock, timestamp=1475507091676, value=TESCO PLC
 TSCO-1-Apr-09
column=stock_info:ticker, timestamp=1475507091676, value=TSCO


What do you think?

Thanks

Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 3 October 2016 at 15:10, Jean-Marc Spaggiari <jean-marc@spaggiari.org>
wrote:

> Hi Mich,
>
> As you said, it's most probably because it's all the same key... If you
> want to be 200% sure, just alter VERSIONS => '1' to be greater (like, 10)
> and scan all the versions of the cells. You should see the others.
>
> JMS
>
> 2016-10-03 3:41 GMT-04:00 Mich Talebzadeh <mich.talebzadeh@gmail.com>:
>
> > Hi,
> >
> > when I use the command line utility ImportTsv  to load a file into Hbase
> > with the following table format
> >
> > describe 'marketDataHbase'
> > Table marketDataHbase is ENABLED
> > marketDataHbase
> > COLUMN FAMILIES DESCRIPTION
> > {NAME => 'price_info', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY
> =>
> > 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE',
> TTL
> > => 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKC
> > ACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
> > 1 row(s) in 0.0930 seconds
> >
> >
> > hbase org.apache.hadoop.hbase.mapreduce.ImportTsv
> > -Dimporttsv.separator=','
> > -Dimporttsv.columns="HBASE_ROW_KEY, stock_daily:ticker,
> > stock_daily:tradedate, stock_daily:open,stock_daily:
> > high,stock_daily:low,stock_daily:close,stock_daily:volume" tsco
> > hdfs://rhes564:9000/data/stocks/tsco.csv
> >
> > There are with 1200 rows in the csv file,* but it only loads the first
> > row!*
> >
> > scan 'tsco'
> > ROW                                                    COLUMN+CELL
> >  Tesco PLC
> > column=stock_daily:close, timestamp=1475447365118, value=325.25
> >  Tesco PLC
> > column=stock_daily:high, timestamp=1475447365118, value=332.00
> >  Tesco PLC
> > column=stock_daily:low, timestamp=1475447365118, value=324.00
> >  Tesco PLC
> > column=stock_daily:open, timestamp=1475447365118, value=331.75
> >  Tesco PLC
> > column=stock_daily:ticker, timestamp=1475447365118, value=TSCO
> >  Tesco PLC
> > column=stock_daily:tradedate, timestamp=1475447365118, value= 3-Jan-06
> >  Tesco PLC
> > column=stock_daily:volume, timestamp=1475447365118, value=46935045
> > 1 row(s) in 0.0390 seconds
> >
> > Is this because the hbase_row_key --> Tesco PLC is the same for all? I
> > thought that the row key can be anything.
> >
> > Thanks
> >
> > Dr Mich Talebzadeh
> >
> >
> >
> > LinkedIn * https://www.linkedin.com/profile/view?id=
> > AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> > <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCd
> > OABUrV8Pw>*
> >
> >
> >
> > http://talebzadehmich.wordpress.com
> >
> >
> > *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> > loss, damage or destruction of data or any other property which may arise
> > from relying on this email's technical content is explicitly disclaimed.
> > The author will in no case be liable for any monetary damages arising
> from
> > such loss, damage or destruction.
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message