hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Nguyen <andrew-lists-hb...@ucsfcti.org>
Subject Re: Running an jython import job
Date Fri, 23 Jul 2010 21:25:14 GMT
Thanks for the info.  I actually used that blog post as a starting point for my work with jython.

I will also take a look at the bulk loading you referenced below.  We are currently only doing
single-cf imports.

--Andrew

--
Andrew Nguyen
andrew@ucsfcti.org

The information contained in this electronic message and any attachments to this message are
intended for the exclusive use of the addressee(s) and may contain confidential or privileged
information.  Any unauthorized review, dissemination, distribution, or copying of this communication
is prohibited.  If you are not the intended recipient, please notify the sender immediately
by reply e-mail, and destroy all copies of this message and any attachments from your files.






On Jul 23, 2010, at 10:31 AM, Stack wrote:

> On Fri, Jul 23, 2010 at 10:18 AM, Andrew Nguyen
> <andrew-lists-hbase@ucsfcti.org> wrote:
>> 
>> The jython page on the wiki was extremely useful.  I actually had never used jython
before but am a big fan of python for getting stuff up quickly so it seemed to be a natural
progression.  Having said that, I am looking at importing a ton of rows (not sure how much
but hundreds of millions to billions).  Are there any good examples on doing this as efficiently
as possible?  And, how does jython compare to a pure Java approach?
>> 
> 
> There is an old blog of Ryan's from back when he was doing all he
> could to not sully his paws with dirty java:
> http://ryantwopointoh.blogspot.com/2009/01/performance-of-hbase-importing.html
> Its an old post.  Jython may have come on since then.
> 
>> Currently, I have a for loop just calling table.put(p) repeatedly.  I also have WAL
disabled, autoflush set to false, and increased the buffer.  Anything else I should consider?
>> 
> 
> You are on the right track.  You might want to move to java but do the
> timing first.
> 
> There is also http://hbase.apache.org/docs/r0.20.5/api/org/apache/hadoop/hbase/mapreduce/package-summary.html#bulk
> which has been buggy up to this though should be working now.  Its
> good if you are doing single-columnfamily only imports.   Usually you
> can see order-of-magnitude improvement in speeds bulk inserting.  This
> bulk load facility got redone completely in TRUNK, and for sure it
> works now.  Its super fancy; you can even bulk load into a running
> table; read more here:
> http://hbase.apache.org/docs/r0.89.20100621/bulk-loads.html
> 
> St.Ack
> 
>> Thanks!
>> 
>> --Andrew
>> 
>> --
>> Andrew Nguyen
>> andrew@ucsfcti.org
>> 
>> The information contained in this electronic message and any attachments to this
message are intended for the exclusive use of the addressee(s) and may contain confidential
or privileged information.  Any unauthorized review, dissemination, distribution, or copying
of this communication is prohibited.  If you are not the intended recipient, please notify
the sender immediately by reply e-mail, and destroy all copies of this message and any attachments
from your files.
>> 
>> 
>> 
>> 
>> 
>> 
>> On Jul 23, 2010, at 10:05 AM, Stack wrote:
>> 
>>> This is just our noisy client talking about the caching of region
>>> locations out on the cluster (You are at DEBUG level).  Turn off DEBUG
>>> in client if you'd rather not see the messages -- see the FAQ for how
>>> -- or just ignore.  When they turn WARN or ERROR, start paying
>>> attention.
>>> 
>>> Did they jython page up on wiki help?
>>> Yours,
>>> St.Ack
>>> 
>>> On Fri, Jul 23, 2010 at 9:58 AM, Andrew Nguyen
>>> <andrew-lists-hbase@ucsfcti.org> wrote:
>>>> Hello all,
>>>> 
>>>> I am running a job from jython that is importing time series data into HBase.
 I started to see the following messages and wanted to dive deeper to find out if they are
true errors or just debug messages:
>>>> 
>>>> 10/07/23 09:51:07 DEBUG client.HConnectionManager$TableServers: Reloading
region subset,a40506-2016/07/23-20:33:30.296,1279902520534 location because regionserver didn't
accept updates; tries=0 of max=10, waiting=1000ms
>>>> 10/07/23 09:51:08 DEBUG client.HConnectionManager$TableServers: Cached location
for .META.,,1 is 10.10.11.3:60020
>>>> 10/07/23 09:51:08 DEBUG client.HConnectionManager$TableServers: locateRegionInMeta
attempt 0 of 10 failed; retrying after sleep of 1000 because: No server address listed in
.META. for region subset,a40506-2016/07/24-07:00:35.528,1279903897169
>>>> 10/07/23 09:51:09 DEBUG client.HConnectionManager$TableServers: Cached location
for subset,a40506-2016/07/24-07:00:35.528,1279903897169 is 10.10.11.2:60020
>>>> 
>>>> I did some searches on google and this seems to point at the potential lack
of memory.  Currently, HBase is setup with a heap of 2G for each slave, and there are 6 slaves.
 Each slave has a total of 8G of RAM installed.  If you guys have any guidance on what other
settings I should look for, please let me know.
>>>> 
>>>> Thanks!
>>>> 
>>>> --Andrew
>>>> 
>>>> --
>>>> Andrew Nguyen
>>>> andrew@ucsfcti.org
>>>> 
>>>> The information contained in this electronic message and any attachments
to this message are intended for the exclusive use of the addressee(s) and may contain confidential
or privileged information.  Any unauthorized review, dissemination, distribution, or copying
of this communication is prohibited.  If you are not the intended recipient, please notify
the sender immediately by reply e-mail, and destroy all copies of this message and any attachments
from your files.
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>> 
>> 


Mime
View raw message