hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Segel <michael_se...@hotmail.com>
Subject Re: Problems while exporting from Hbase to CSV file
Date Thu, 27 Jun 2013 22:32:16 GMT
Phoenix, Hive, Pig, Java would all work. 
But to Azury Yu's post... 

The OP is doing a simple scan() to get rows. 
If the OP is hitting an OOM exception then its a code issue on the part of the OP. 

On Jun 27, 2013, at 2:22 AM, Azuryy Yu <azuryyyu@gmail.com> wrote:

> Sorry, maybe Phonex is not suitable for you.
> On Thu, Jun 27, 2013 at 3:21 PM, Azuryy Yu <azuryyyu@gmail.com> wrote:
>> 1) Scan.setCaching() to specify the number of rows for caching that will
>> be passed to scanners.
>>    and what's your block cache size?
>>    but if OOM from the client, not sever side, then I don't think this is
>> Scan related, please check your client code.
>> 2) we cannot add default value from HBase,  but you can add it on your
>> client when iterate the Result.
>> Also, you can using Phonex, this is cool for your scenario.
>> https://github.com/forcedotcom/phoenix
>> On Thu, Jun 27, 2013 at 3:11 PM, Vimal Jain <vkjk89@gmail.com> wrote:
>>> Hi,
>>> I am trying to export from hbase to a CSV file.
>>> I am using "Scan" class to scan all data  in the table.
>>> But i am facing some problems while doing it.
>>> 1) My table has around 1.5 million rows  and around 150 columns for each
>>> row , so i can not use default scan() constructor as it will scan whole
>>> table in one go which results in OutOfMemory error in client process.I
>>> heard of using setCaching() and setBatch() but i am not able to understand
>>> how it will solve OOM error.
>>> I thought of providing startRow and stopRow in scan object but i want to
>>> scan whole table so how will this help ?
>>> 2) As hbase stores data for a row only when we explicitly provide it and
>>> their is no concept of default value as found in RDBMS , i want to have
>>> each and evey column in the CSV file i generate for every user.In case
>>> column values are not there in hbase , i want to use default  values for
>>> them(I have list of default values for each column). Is there any method
>>> in
>>> Result class or any other class to accomplish this ?
>>> Please help here.
>>> --
>>> Thanks and Regards,
>>> Vimal Jain

View raw message