hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dmitriy Lyfar <dly...@gmail.com>
Subject Re: Problems with write performance (25kb rows)
Date Mon, 04 Jan 2010 13:18:41 GMT
Hello, Stack

> Of course I will insert less rows per second in
> > case of 25Kb, but throughput should stay the same. Now I'm trying to run
> > several instances of client each of them inserts 100K records (each
> record
> > is 25Kb). Time of execution grows for each client.
> >
> >
> > >
> > > In general, our client ain't to good at multiplexing because of such as
> > the
> > > above noted limitation (our client does not yet do nio).  If you want
> to
> > > test cluster performance, run multiple concurrent clients each to its
> own
> > > process.  MapReduce is good for doing this.  See the
> > PerformanceEvaluation
> > > code for a sample MR job that floats many clients doing different
> loading
> > > types.
> > >
> >
> > MapReduce is good idea, but actually we don't have data which is located
> in
> > hadoop, we processes data in realtime and insert it into hbase. So I
> think
> > it will be inefficient to write our data in hadoop and then run MapReduce
> > work which will insert that data into the tables.
> >
> >
> Agreed.  Was just suggesting it as a way of parallellizing clients.  I
> presume that the source of the data feed is multiple, that you can run
> multiple instances of your upload process?
>

Yes, I think I can run multiple instances of uploader.


>
> > >
> > Time with several clients is growing. For example when I'm running four
> > processes, each of them have one inserter thread I got following results:
> > 1) Thread-1 have finished its work in 189 sec
> > 2) Thread-1 have finished its work in 198 sec
> > 3) Thread-1 have finished its work in 206 sec
> > 4) Thread-1 have finished its work in 208 sec
> > I.e. each next process works longer than previous. It was timings for
> test
> > where each process inserts 100K 25Kb rows with WAL on. Btw WAL have great
> > impact on performance when I increase size of row. I have about 80 sec
> for
> > this test with WAL off. Also when running several clients nodes seems
> still
> > almost idle.
> >
>
> Oh, how many regions in your cluster?  At the start, all clients will be
> hitting a single region (and thus a single server).  Check your master
> console at port 60010.
>
> You could rerun a second upload just after a first upload.


As I said I have 6 nodes except master node and each node has 235 regions.
1406 regions total.
And throughput without WAL is about 50 Mb/sec and  about 15 Mb/sec with WAL
on. When I run clients in serial order (i.e. at the moment there is only one
working script) time almost stable and not grows.


> See what the
> numbers are like uploading into a table that is pre-split?


Sorry, what you mean pre-split? You mean splitting regions before running
script?


-- 
Regards, Lyfar Dmitriy

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message