hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Segel <michael_se...@hotmail.com>
Subject Re: Export / Import and table splits
Date Tue, 07 May 2013 23:11:00 GMT
I don't see much value in duplicating the table's structure, but IMHO, the jury is still out.



On May 7, 2013, at 6:02 PM, Jean-Marc Spaggiari <jean-marc@spaggiari.org> wrote:

> @Mohammad: The end goal is really more regarding the splits more than
> the model. So I don't think Lars' options are good for this usecase.
> @Mike: I agree that things were not configured correctly. User should
> have had split the table before doing the import. I like the idea of
> looking at the files to get the regions boundaries. That way you don't
> need to have the source_table still there...
> 
> So we have 2 different things here.
> 1) a command on the shell to duplicate a table structure
> 2) an option on the import command to split the table regions based on
> the files names.
> 
> If we agree on that I will open one JIRA for each...
> 
> JM
> 
> 2013/5/7 Michael Segel <michael_segel@hotmail.com>:
>> Silly question...
>> 
>> If you're doing a simple export, then you end up with all of your prior regions as
separate files in a directory, right?
>> 
>> So in theory, you could find the first row and the last complete row of each file
and then do your pre-splits based on the start key and end key that you find.
>> 
>> That would be your tool so to speak.
>> 
>> But to the point that reading back in these files will cause you to crash your RS
and HBase?
>> That doesn't sound like its well tuned or right.
>> 
>> HTH
>> -Mike
>> 
>> On May 7, 2013, at 5:29 PM, Ted Yu <yuzhihong@gmail.com> wrote:
>> 
>>> I am not aware of a tool which can pre-split table using another table's
>>> region boundaries as template.
>>> 
>>> Such a tool would be nice to have.
>>> 
>>> Cheers
>>> 
>>> On Tue, May 7, 2013 at 3:23 PM, Jean-Marc Spaggiari <jean-marc@spaggiari.org
>>>> wrote:
>>> 
>>>> Hi,
>>>> 
>>>> When we are doing an export, we are only exporting the data. Then when
>>>> we are importing that back, we need to make sure the table is
>>>> pre-splitted correctly else we might hotspot some servers.
>>>> 
>>>> If you simply export then import without pre-splitting at all, you
>>>> will most probably brought some servers down because they will be
>>>> overwhelmed with splits and compactions.
>>>> 
>>>> Do we have any tool to pre-split a table the same way another table is
>>>> already pre-splitted?
>>>> 
>>>> Something like
>>>>> duplicate 'source_table', 'target_table'
>>>> 
>>>> Which will create a new table called 'target_table' with exactly the
>>>> same parameters as 'source_table' and the same regions boundaries?
>>>> 
>>>> If we don't have, will it be useful to have one?
>>>> 
>>>> Or event something like:
>>>>> create 'target_table', 'f1', {SPLITS_MODEL => 'source_table'}
>>>> 
>>>> 
>>>> JM
>>>> 
>> 
> 


Mime
View raw message