hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: Writting bottleneck in HBase ?
Date Wed, 23 Nov 2016 16:17:30 GMT
bq. it calls the persistence method asynchronously

Assuming the persistence method is still executing when the next threshold
value is reached, do you have other threads to do persistence ?
If so, how many threads can potentially run at the same time ?

How many regions does the table have ?

What's the distribution of parameter Ids in the input file ? One case is
that the parameter Ids are sequential w.r.t. region boundaries, ending up
with writes region by region.

On Wed, Nov 23, 2016 at 8:01 AM, schausson <schausson@softera.fr> wrote:

> Hi,
>
> I am new to HBase and I'm facing performance issues ...
>
> Short story : I want to persist 10000000 values in HBase and it takes same
> time on a basic sandbox (HDP hadoop sandbox with single region server node)
> as it takes on our "production" cluster (that comprises 12 region server
> with higher capabilities than my developer's laptop ...)
>
> Detailed case :
>
> Basically, the use case is : My java application receives a binary file
> that
> contains timeseries, decodes them and stores decoded data into a single
> HBase table.
> HBase table design : we store one parameter per row, and we create one
> column per timestamp to store associated value.
> My test case is based on an input file that spawns ~2000 rows/parameters
> containing ~5000 values per row (=> around 10000000 values to store in my
> HBase table in the end)
>
> For this purpose, my application uses hbase client API :
> Basically, my code proceeds as following : it decodes parameters timeseries
> from input file and stores these values in a map<paramId, List&lt;value>>.
>
> When it reaches 10000 values (threshold that may be changed), it calls the
> persistence method asynchronously and continue decoding operation till end
> of the input file.
> The persistence method proceeds like this (simplified code) :
> /for (paramId : map.keys) {
>         Put put = new Put(paramId);
>         for (value : map.get(paramId)) {
>                 put.addColumn(family, columnName, value)
>         }
>         table.put(put);
> }
> /
> Choosing a threshold value of 10000 leads to ~1000 calls to persistence
> method. Each call generates 2000 calls to table.put() method, each put
> containing ~5 columns.
>
> When I run this on HDP sandbox on my laptop (single region server), it
> processes in less than 2 minutes
> When I run this on our production cluster (12 region servers), it processes
> in 2 minutes and sometimes more.
>
> My question is : is the writting load distributed across all the region
> servers ? obviously no... What should I do if I want my application to
> scale
> properly when we add additional region servers ?
>
> I don't know if I gave enough information, so please do not hesitate to ask
> me more detail if needed, but any help would be greatly appreciated ...
>
> Regards
>
> Sebastien
>
>
>
>
> --
> View this message in context: http://apache-hbase.679495.n3.
> nabble.com/Writting-bottleneck-in-HBase-tp4084656.html
> Sent from the HBase User mailing list archive at Nabble.com.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message