hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vrushali C (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (YARN-5336) Put in some limit for accepting key-values in hbase writer
Date Wed, 02 Nov 2016 20:57:58 GMT

    [ https://issues.apache.org/jira/browse/YARN-5336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15630472#comment-15630472
] 

Vrushali C edited comment on YARN-5336 at 11/2/16 8:57 PM:
-----------------------------------------------------------

Some other interesting points to keep in mind:

As per https://hbase.apache.org/book.html#table_schema_rules_of_thumb , we should aim to have
cells no larger than 10 MB, or 50 MB if we use mob. Otherwise, consider storing your cell
data in HDFS and store a pointer to the data in HBase.

Aim to have regions sized between 10 and 50 GB.

Aim to have cells no larger than 10 MB, or 50 MB if you use mob. Otherwise, consider storing
your cell data in HDFS and store a pointer to the data in HBase.

A typical schema has between 1 and 3 column families per table. HBase tables should not be
designed to mimic RDBMS tables. Around 50-100 regions is a good number for a table with 1
or 2 column families. Remember that a region is a contiguous segment of a column family.

Keep your column family names as short as possible. The column family names are stored for
every value (ignoring prefix encoding). They should not be self-documenting and descriptive
like in a typical RDBMS.

About Medium sized objects (https://hbase.apache.org/book.html#hbase_mob)

While HBase can technically handle binary objects with cells that are larger than 100 KB in
size, HBase’s normal read and write paths are optimized for values smaller than 100KB in
size. When HBase deals with large numbers of objects over this threshold, referred to here
as medium objects, or MOBs, performance is degraded due to write amplification caused by splits
and compactions. When using MOBs, ideally your objects will be between 100KB and 10MB. HBase
FIX_VERSION_NUMBER adds support for better managing large numbers of MOBs while maintaining
performance, consistency, and low operational overhead. MOB support is provided by the work
done in HBASE-11339. To take advantage of MOB, you need to use HFile version 3. Optionally,
configure the MOB file reader’s cache settings for each RegionServer (see Configuring the
MOB Cache), then configure specific columns to hold MOB data. Client code does not need to
change to take advantage of HBase MOB support. The feature is transparent to the client.




was (Author: vrushalic):
Some other interesting points to keep in mind:

As per https://hbase.apache.org/book.html#table_schema_rules_of_thumb , we should aim to have
cells no larger than 10 MB, or 50 MB if we use mob. Otherwise, consider storing your cell
data in HDFS and store a pointer to the data in HBase.

Aim to have regions sized between 10 and 50 GB.

Aim to have cells no larger than 10 MB, or 50 MB if you use mob. Otherwise, consider storing
your cell data in HDFS and store a pointer to the data in HBase.

A typical schema has between 1 and 3 column families per table. HBase tables should not be
designed to mimic RDBMS tables.

Around 50-100 regions is a good number for a table with 1 or 2 column families. Remember that
a region is a contiguous segment of a column family.

Keep your column family names as short as possible. The column family names are stored for
every value (ignoring prefix encoding). They should not be self-documenting and descriptive
like in a typical RDBMS.

About Medium sized objects (https://hbase.apache.org/book.html#hbase_mob)

While HBase can technically handle binary objects with cells that are larger than 100 KB in
size, HBase’s normal read and write paths are optimized for values smaller than 100KB in
size. When HBase deals with large numbers of objects over this threshold, referred to here
as medium objects, or MOBs, performance is degraded due to write amplification caused by splits
and compactions. When using MOBs, ideally your objects will be between 100KB and 10MB. HBase
FIX_VERSION_NUMBER adds support for better managing large numbers of MOBs while maintaining
performance, consistency, and low operational overhead. MOB support is provided by the work
done in HBASE-11339. To take advantage of MOB, you need to use HFile version 3. Optionally,
configure the MOB file reader’s cache settings for each RegionServer (see Configuring the
MOB Cache), then configure specific columns to hold MOB data. Client code does not need to
change to take advantage of HBase MOB support. The feature is transparent to the client.



> Put in some limit for accepting key-values in hbase writer
> ----------------------------------------------------------
>
>                 Key: YARN-5336
>                 URL: https://issues.apache.org/jira/browse/YARN-5336
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Vrushali C
>            Assignee: Vrushali C
>              Labels: YARN-5355
>
> As recommended by [~jrottinghuis] , need to add in some limit (default and configurable)
for accepting key values to be written to the backend.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message