hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Omkar Joshi <Omkar.Jo...@lntinfotech.com>
Subject RE: Loading text files from local file system
Date Wed, 17 Apr 2013 09:11:24 GMT
Yeah DFS space is a constraint.

I'll check the options specified by you.

Regards,
Omkar Joshi

-----Original Message-----
From: Suraj Varma [mailto:svarma.ng@gmail.com] 
Sent: Wednesday, April 17, 2013 2:07 PM
To: user@hbase.apache.org
Subject: Re: Loading text files from local file system

Maybe I misunderstood your constraint ... are you saying that your DFS
itself is having constraint due to file size & replication? If so, how
about setting dfs.replication to 1 for the job?

There are other options like chopping up your file and processing it
piecemeal ... or perhaps customizing LoadIncrementalFiles to process
compressed input files and so forth ...

See if the dfs.replication + hfile.compression option works for you first.
--Suraj



On Wed, Apr 17, 2013 at 1:00 AM, Suraj Varma <svarma.ng@gmail.com> wrote:

> Have you considered using hfile.compression, perhaps with snappy
> compression?
> See this thread:
> http://grokbase.com/t/hbase/user/10cqrd06pc/hbase-bulk-load-script
> --Suraj
>
>
>
> On Tue, Apr 16, 2013 at 9:31 PM, Omkar Joshi <Omkar.Joshi@lntinfotech.com>wrote:
>
>> The background thread is here :
>>
>>
>> http://mail-archives.apache.org/mod_mbox/hbase-user/201304.mbox/%3CE689A42B73C5A545AD77332A4FC75D8C1EFBE84153@VSHINMSMBX01.vshodc.lntinfotech.com%3E
>>
>> Following are the commands that I'm using to load files onto HBase :
>>
>> HADOOP_CLASSPATH=`${HBASE_HOME}/bin/hbase classpath`
>> ${HADOOP_HOME}/bin/hadoop jar ${HBASE_HOME}/hbase-0.94.6.1.jar importtsv
>> '-Dimporttsv.separator=;'
>> -Dimporttsv.columns=HBASE_ROW_KEY,PRODUCT_INFO:NAME,PRODUCT_INFO:CATEGORY,PRODUCT_INFO:GROUP,PRODUCT_INFO:COMPANY,PRODUCT_INFO:COST,PRODUCT_INFO:COLOR,PRODUCT_INFO:BLANK_COLUMN
>> -Dimporttsv.bulk.output=hdfs://cldx-1139-1033:9000/hbase/storefileoutput_6
>> PRODUCTS hdfs://cldx-1139-1033:9000/hbase/copiedFromLocal/product_6.txt
>>
>> HADOOP_CLASSPATH=`${HBASE_HOME}/bin/hbase classpath`
>> ${HADOOP_HOME}/bin/hadoop jar ${HBASE_HOME}/hbase-0.94.6.1.jar
>> completebulkload hdfs://cldx-1139-1033:9000/hbase/storefileoutput_6 PRODUCTS
>>
>> As seen, the text files to be loaded in HBase first need to be loaded on
>> HDFS. Given our infrastructure constraints/limitations, I'm getting space
>> issues. The data in the text files is around 20GB + replication is
>> consuming a lot of DFS.
>>
>> Is there a way wherein a text file can be loaded directly from the local
>> file system onto HBase?
>>
>> Regards,
>> Omkar Joshi
>>
>> ________________________________
>> The contents of this e-mail and any attachment(s) may contain
>> confidential or privileged information for the intended recipient(s).
>> Unintended recipients are prohibited from taking action on the basis of
>> information in this e-mail and using or disseminating the information, and
>> must notify the sender and delete it from their system. L&T Infotech will
>> not accept responsibility or liability for the accuracy or completeness of,
>> or the presence of any virus or disabling code in this e-mail"
>>
>
>

Mime
View raw message