hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: Extremely high CPU usage after upgrading to Hbase 1.4.4
Date Sat, 08 Sep 2018 05:40:41 GMT
The createFirstOnRow() is used by ColumnXXFilter's getNextCellHint() method.
I am thinking about adding a variant to getNextCellHint() which returns a
tuple, representing first on row, consisting of:
  Cell - the passed in Cell instance
  byte[] - qualifier array
  int - qualifier offset
  int - qualifier length
This variant doesn't allocate (new) Cell / KeyValue.

This way, FilterListWithOR#shouldPassCurrentCellToFilter can use the
returned tuple for comparison.

FYI

On Fri, Sep 7, 2018 at 10:00 PM Ted Yu <yuzhihong@gmail.com> wrote:

> Thanks for detailed background information.
>
> I assume your code has done de-dup for the filters contained in
> FilterListWithOR.
>
> I took a look at JIRAs which
> touched hbase-client/src/main/java/org/apache/hadoop/hbase/filter in
> branch-1.4
> There were a few patches (some were very big) since the release of 1.3.0
> So it is not obvious at first glance which one(s) might be related.
>
> I noticed ColumnPrefixFilter.getNextCellHint (and
> KeyValueUtil.createFirstOnRow) appearing many times in the stack trace.
>
> I plan to dig more in this area.
>
> Cheers
>
> On Fri, Sep 7, 2018 at 11:30 AM Srinidhi Muppalla <srinidhim@trulia.com>
> wrote:
>
>> Sure thing. For our table schema, each row represents one user and the
>> row key is that user’s unique id in our system. We currently only use one
>> column family in the table. The column qualifiers represent an item that
>> has been surfaced to that user as well as additional information to
>> differentiate the way the item has been surfaced to the user. Without
>> getting into too many specifics, the qualifier follows the rough format of:
>>
>> “Channel-itemId-distinguisher”.
>>
>> The channel here is the channel through the item was previously surfaced
>> to the user. The itemid is the unique id of the item that has been surfaced
>> to the user. A distinguisher is some attribute about how that item was
>> surfaced to the user.
>>
>> When we run a scan, we currently only ever run it on one row at a time.
>> It was chosen over ‘get’ because (from our understanding) the performance
>> difference is negligible, and down the road using scan would allow us some
>> more flexibility.
>>
>> The filter list that is constructed with scan works by using a
>> ColumnPrefixFilter as you mentioned. When a user is being communicated to
>> on a particular channel, we have a list of items that we want to
>> potentially surface for that user. So, we construct a prefix list with the
>> channel and each of the item ids in the form of: “channel-itemId”. Then we
>> run a scan on that row with that filter list using “WithOr” to get all of
>> the matching channel-itemId combinations currently in that row/column
>> family in the table. This way we can then know which of the items we want
>> to surface to that user on that channel have already been surfaced on that
>> channel. The reason we query using a prefix filter is so that we don’t need
>> to know the ‘distinguisher’ part of the record when writing the actual
>> query, because the distinguisher is only relevant in certain circumstances.
>>
>> Let me know if this is the information about our query pattern that you
>> were looking for and if there is anything I can clarify or add.
>>
>> Thanks,
>> Srinidhi
>>
>> On 9/6/18, 12:24 PM, "Ted Yu" <yuzhihong@gmail.com> wrote:
>>
>>     From the stack trace, ColumnPrefixFilter is used during scan.
>>
>>     Can you illustrate how various filters are formed thru
>> FilterListWithOR ?
>>     It would be easier for other people to reproduce the problem given
>> your
>>     query pattern.
>>
>>     Cheers
>>
>>     On Thu, Sep 6, 2018 at 11:43 AM Srinidhi Muppalla <
>> srinidhim@trulia.com>
>>     wrote:
>>
>>     > Hi Vlad,
>>     >
>>     > Thank you for the suggestion. I recreated the issue and attached
>> the stack
>>     > traces I took. Let me know if there’s any other info I can provide.
>> We
>>     > narrowed the issue down to occurring when upgrading from 1.3.0 to
>> any 1.4.x
>>     > version.
>>     >
>>     > Thanks,
>>     > Srinidhi
>>     >
>>     > On 9/4/18, 8:19 PM, "Vladimir Rodionov" <vladrodionov@gmail.com>
>> wrote:
>>     >
>>     >     Hi, Srinidhi
>>     >
>>     >     Next time you will see this issue, take jstack of a RS several
>> times
>>     > in a
>>     >     row. W/o stack traces it is hard
>>     >     to tell what was going on with your cluster after upgrade.
>>     >
>>     >     -Vlad
>>     >
>>     >
>>     >
>>     >     On Tue, Sep 4, 2018 at 3:50 PM Srinidhi Muppalla <
>> srinidhim@trulia.com
>>     > >
>>     >     wrote:
>>     >
>>     >     > Hello all,
>>     >     >
>>     >     > We are currently running Hbase 1.3.0 on an EMR cluster
>> running EMR
>>     > 5.5.0.
>>     >     > Recently, we attempted to upgrade our cluster to using Hbase
>> 1.4.4
>>     > (along
>>     >     > with upgrading our EMR cluster to 5.16). After upgrading, the
>> CPU
>>     > usage for
>>     >     > all of our region servers spiked up to 90%. The load_one for
>> all of
>>     > our
>>     >     > servers spiked from roughly 1-2 to 10 threads. After
>> upgrading, the
>>     > number
>>     >     > of operations to the cluster hasn’t increased. After giving
>> the
>>     > cluster a
>>     >     > few hours, we had to revert the upgrade. From the logs, we are
>>     > unable to
>>     >     > tell what is occupying the CPU resources. Is this a known
>> issue with
>>     > 1.4.4?
>>     >     > Any guidance or ideas for debugging the cause would be greatly
>>     >     > appreciated.  What are the best steps for debugging CPU usage?
>>     >     >
>>     >     > Thank you,
>>     >     > Srinidhi
>>     >     >
>>     >
>>     >
>>     >
>>
>>
>>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message