hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ramkrishna vasudevan <ramkrishna.s.vasude...@gmail.com>
Subject Re: HBase Writes With Large Number of Columns
Date Tue, 26 Mar 2013 06:19:50 GMT
Hi Pankaj

Is it possible for you to profile the RS when this happens?  Either may be
like the Thrift adds an overhead or it should be some where the code is
spending more time.

As you said there may be a slight decrease in performance of the put
because now more values has to go in but should not be this significant.
We can work on based on the profile output and check what are we doing.

Regards
Ram

On Tue, Mar 26, 2013 at 5:19 AM, Jean-Marc Spaggiari <
jean-marc@spaggiari.org> wrote:

> For a total of 1.5kb with 4 columns = 384 bytes/column
> bin/hbase org.apache.hadoop.hbase.util.LoadTestTool -write 4:384:100
> -num_keys 1000000
> 13/03/25 14:54:45 INFO util.MultiThreadedAction: [W:100] Keys=991664,
> cols=3,8m, time=00:03:55 Overall: [keys/s= 4218, latency=23 ms]
> Current: [keys/s=4097, latency=24 ms], insertedUpTo=-1
>
> For a total of 1.5kb with 100 columns = 15 bytes/column
> bin/hbase org.apache.hadoop.hbase.util.LoadTestTool -write 100:15:100
> -num_keys 1000000
> 13/03/25 16:27:44 INFO util.MultiThreadedAction: [W:100] Keys=999721,
> cols=95,3m, time=01:27:46 Overall: [keys/s= 189, latency=525 ms]
> Current: [keys/s=162, latency=616 ms], insertedUpTo=-1
>
> So overall, the speed is the same. A bit faster with 100 columns than
> with 4. I don't think there is any negative impact on HBase side
> because of all those columns. Might be interesting to test the same
> thing over Thrift...
>
> JM
>
> 2013/3/25 Pankaj Misra <pankaj.misra@impetus.co.in>:
> > Yes Ted, we have been observing Thrift API to clearly outperform Java
> native Hbase API, due to binary communication protocol, at higher loads.
> >
> > Tariq, the specs of the machine on which we are performing these tests
> are as given below.
> >
> > Processor : i3770K, 8 logical cores (4 physical, with 2 logical per
> physical core), 3.5 Ghz clock speed
> > RAM: 32 GB DDR3
> > HDD: Single SATA 2 TB disk, Two 250 GB SATA HDD - Total of 3 disks
> > HDFS and Hbase deployed in pseudo-distributed mode.
> > We are having 4 parallel streams writing to HBase.
> >
> > We used the same setup for the previous tests as well, and to be very
> frank, we did expect a bit of drop in performance when we had to test with
> 40 columns, but did not expect to get half the performance. When we tested
> with 20 columns, we were consistently getting a performance of 200 mbps of
> writes. But with 40 columns we are getting 90 mbps of throughput only on
> the same setup.
> >
> > Thanks and Regards
> > Pankaj Misra
> >
> >
> > ________________________________________
> > From: Ted Yu [yuzhihong@gmail.com]
> > Sent: Tuesday, March 26, 2013 1:09 AM
> > To: user@hbase.apache.org
> > Subject: Re: HBase Writes With Large Number of Columns
> >
> > bq. These records are being written using batch mutation with thrift API
> > This is an important information, I think.
> >
> > Batch mutation through Java API would incur lower overhead.
> >
> > On Mon, Mar 25, 2013 at 11:40 AM, Pankaj Misra
> > <pankaj.misra@impetus.co.in>wrote:
> >
> >> Firstly, Thanks a lot Jean and Ted for your extended help, very much
> >> appreciate it.
> >>
> >> Yes Ted I am writing to all the 40 columns and 1.5 Kb of record data is
> >> distributed across these columns.
> >>
> >> Jean, some columns are storing as small as a single byte value, while
> few
> >> of the columns are storing as much as 80-125 bytes of data. The overall
> >> record size is 1.5 KB. These records are being written using batch
> mutation
> >> with thrift API, where in we are writing 100 records per batch mutation.
> >>
> >> Thanks and Regards
> >> Pankaj Misra
> >>
> >>
> >> ________________________________________
> >> From: Jean-Marc Spaggiari [jean-marc@spaggiari.org]
> >> Sent: Monday, March 25, 2013 11:57 PM
> >> To: user@hbase.apache.org
> >> Subject: Re: HBase Writes With Large Number of Columns
> >>
> >> I just ran some LoadTest to see if I can reproduce that.
> >>
> >> bin/hbase org.apache.hadoop.hbase.util.LoadTestTool -write 4:512:100
> >> -num_keys 1000000
> >> 13/03/25 14:18:25 INFO util.MultiThreadedAction: [W:100] Keys=997172,
> >> cols=3,8m, time=00:03:55 Overall: [keys/s= 4242, latency=23 ms]
> >> Current: [keys/s=4413, latency=22 ms], insertedUpTo=-1
> >>
> >> bin/hbase org.apache.hadoop.hbase.util.LoadTestTool -write 100:512:100
> >> -num_keys 1000000
> >>
> >> This one crashed because I don't have enought disk space, so I'm
> >> re-running it, but just before it crashed it was showing about 24.5
> >> slower. which is coherent since it's writing 25 more columns.
> >>
> >> What size of data do you have? Big cells? Small cells? I will retry
> >> the test above with more lines and keep you posted.
> >>
> >> 2013/3/25 Pankaj Misra <pankaj.misra@impetus.co.in>:
> >> > Yes Ted, you are right, we are having table regions pre-split, and we
> >> see that both regions are almost evenly filled in both the tests.
> >> >
> >> > This does not seem to be a regression though, since we were getting
> good
> >> write rates when we had lesser number of columns.
> >> >
> >> > Thanks and Regards
> >> > Pankaj Misra
> >> >
> >> >
> >> > ________________________________________
> >> > From: Ted Yu [yuzhihong@gmail.com]
> >> > Sent: Monday, March 25, 2013 11:15 PM
> >> > To: user@hbase.apache.org
> >> > Cc: ankitjaincs06@gmail.com
> >> > Subject: Re: HBase Writes With Large Number of Columns
> >> >
> >> > Copying Ankit who raised the same question soon after Pankaj's initial
> >> > question.
> >> >
> >> > On one hand I wonder if this was a regression in 0.94.5 (though
> >> unlikely).
> >> >
> >> > Did the region servers receive (relatively) same write load for the
> >> second
> >> > test case ? I assume you have pre-split your tables in both cases.
> >> >
> >> > Cheers
> >> >
> >> > On Mon, Mar 25, 2013 at 10:18 AM, Pankaj Misra
> >> > <pankaj.misra@impetus.co.in>wrote:
> >> >
> >> >> Hi Ted,
> >> >>
> >> >> Sorry for missing that detail, we are using HBase version 0.94.5
> >> >>
> >> >> Regards
> >> >> Pankaj Misra
> >> >>
> >> >>
> >> >> ________________________________________
> >> >> From: Ted Yu [yuzhihong@gmail.com]
> >> >> Sent: Monday, March 25, 2013 10:29 PM
> >> >> To: user@hbase.apache.org
> >> >> Subject: Re: HBase Writes With Large Number of Columns
> >> >>
> >> >> If you give us the version of HBase you're using, that would give us
> >> some
> >> >> more information to help you.
> >> >>
> >> >> Cheers
> >> >>
> >> >> On Mon, Mar 25, 2013 at 9:55 AM, Pankaj Misra <
> >> pankaj.misra@impetus.co.in
> >> >> >wrote:
> >> >>
> >> >> > Hi,
> >> >> >
> >> >> > The issue that I am facing is around the performance drop of Hbase,
> >> when
> >> >> I
> >> >> > was having 20 columns in a column family Vs now when I am having
40
> >> >> columns
> >> >> > in a column family. The number of columns have doubled and the
> >> >> > ingestion/write speed has also dropped by half. I am writing 1.5
> KB of
> >> >> data
> >> >> > per row across 40 columns.
> >> >> >
> >> >> > Are there any settings that I should look into for tweaking Hbase
> to
> >> >> write
> >> >> > higher number of columns faster?
> >> >> >
> >> >> > I would request community's help to let me know how can I write
to
> a
> >> >> > column family with large number of columns efficiently.
> >> >> >
> >> >> > Would greatly appreciate any help /clues around this issue.
> >> >> >
> >> >> > Thanks and Regards
> >> >> > Pankaj Misra
> >> >> >
> >> >> > ________________________________
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> > NOTE: This message may contain information that is confidential,
> >> >> > proprietary, privileged or otherwise protected by law. The message
> is
> >> >> > intended solely for the named addressee. If received in error,
> please
> >> >> > destroy and notify the sender. Any use of this email is prohibited
> >> when
> >> >> > received in error. Impetus does not represent, warrant and/or
> >> guarantee,
> >> >> > that the integrity of this communication has been maintained nor
> that
> >> the
> >> >> > communication is free of errors, virus, interception or
> interference.
> >> >> >
> >> >>
> >> >> ________________________________
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >> NOTE: This message may contain information that is confidential,
> >> >> proprietary, privileged or otherwise protected by law. The message
is
> >> >> intended solely for the named addressee. If received in error, please
> >> >> destroy and notify the sender. Any use of this email is prohibited
> when
> >> >> received in error. Impetus does not represent, warrant and/or
> guarantee,
> >> >> that the integrity of this communication has been maintained nor that
> >> the
> >> >> communication is free of errors, virus, interception or interference.
> >> >>
> >> >
> >> > ________________________________
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > NOTE: This message may contain information that is confidential,
> >> proprietary, privileged or otherwise protected by law. The message is
> >> intended solely for the named addressee. If received in error, please
> >> destroy and notify the sender. Any use of this email is prohibited when
> >> received in error. Impetus does not represent, warrant and/or guarantee,
> >> that the integrity of this communication has been maintained nor that
> the
> >> communication is free of errors, virus, interception or interference.
> >>
> >> ________________________________
> >>
> >>
> >>
> >>
> >>
> >>
> >> NOTE: This message may contain information that is confidential,
> >> proprietary, privileged or otherwise protected by law. The message is
> >> intended solely for the named addressee. If received in error, please
> >> destroy and notify the sender. Any use of this email is prohibited when
> >> received in error. Impetus does not represent, warrant and/or guarantee,
> >> that the integrity of this communication has been maintained nor that
> the
> >> communication is free of errors, virus, interception or interference.
> >>
> >
> > ________________________________
> >
> >
> >
> >
> >
> >
> > NOTE: This message may contain information that is confidential,
> proprietary, privileged or otherwise protected by law. The message is
> intended solely for the named addressee. If received in error, please
> destroy and notify the sender. Any use of this email is prohibited when
> received in error. Impetus does not represent, warrant and/or guarantee,
> that the integrity of this communication has been maintained nor that the
> communication is free of errors, virus, interception or interference.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message