hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: Large number of column qualifiers
Date Thu, 24 Sep 2015 14:20:28 GMT
Gaurav:
Please also check GC activities on the client side.

Here is the reason I brought this to your attention:
HBASE-14177 Full GC on client may lead to missing scan results

Cheers

On Thu, Sep 24, 2015 at 2:13 AM, ramkrishna vasudevan <
ramkrishna.s.vasudevan@gmail.com> wrote:

> Hi
>
> In the version that you were using by default the caching was 1000 ( I
> believe) need to see the old code.  So in that case it was trying to fetch
> 1000 rows and each row with 20k cols.  Now when you are saying that the
> client was missing rows, did you check the server logs?
>
> Did you get any OutOfOrderScannerException?  There is something called
> 'client.rpc.timeout' which can be increased in your case - but provided
> your caching and batching is adjusted.
>
> In the current trunk code - there is no default caching value (unless
> specified), the server tries to fetch 2MB of data and that is sent back to
> the client.
> In any case I would suggest to check your server logs for any Exceptions.
> Increase the timeout property and adjust your caching and batching to fetch
> the data.  If still the client is missing out on rows then we need the logs
> and analyse things.  Ted's mail referring to
> https://issues.apache.org/jira/browse/HBASE-11544 will give an idea of the
> general behaviour with scans and how it affects scanning bigger and wider
> rows.
>
> Regards
> Ram
>
>
> On Thu, Sep 24, 2015 at 2:32 PM, Gaurav Agarwal <gaurav@arkin.net> wrote:
>
> > Hi,
> >
> > The problem that I am actually facing is that when doing a scan over rows
> > where each row has very large number of cells (large number of columns),
> > the scan API seems to be transparently dropping data - in my case I
> noticed
> > that entire row of data was missing in few cases.
> >
> > On suggestions from Ram(above), I tried doing *scan.setCaching(1)* and
> > optionally,* scan.setBatch(5000)* and the problem got resolved (at least
> > for now).  So this indicates that the client (cannot be server I hope)
> was
> > dropping the cells if the number (or maybe bytes) of cells became quite
> > large across number of rows cached. Note that in my case, the number of
> > bytes per cell is close to 30B (including qualifier,value and timestamp)
> > and each row key is close to 20B.
> >
> > I am not clear what setting controls the maximum number/bytes of cells
> that
> > can be received by the client before this problem surfaces. Can someone
> > please point me these settings/code?
> >
> > On Thu, Sep 24, 2015 at 12:05 PM, Gaurav Agarwal <gaurav@arkin.net>
> wrote:
> >
> > > After spending more time I realised that my understanding and my
> question
> > > (was invalid).
> > > I am still trying to get more information regarding the problem and
> will
> > > update the thread once I have a better handle on the problem.
> > >
> > > Apologies for the confusion..
> > >
> > > On Thu, Sep 24, 2015 at 10:32 AM, ramkrishna vasudevan <
> > > ramkrishna.s.vasudevan@gmail.com> wrote:
> > >
> > >> Am not sure whether you have tried it. the scan API has got an API
> > called
> > >> 'batching'. Did you try it?  So per row if there are more columns you
> > can
> > >> still limit the amount of data being sent to the client. I think the
> > main
> > >> issue you are facing is that the qualifiers getting returned are more
> in
> > >> number and so the client is not able to accept them?
> > >>
> > >> 'Short.MAX_VALUE which is 32,767 bytes.'
> > >> This comment applies for the qualifier length ie. the name that you
> > >> specify
> > >> for the qualifier not on the number of qualifiers.
> > >>
> > >> Regards
> > >> Ram
> > >>
> > >> On Thu, Sep 24, 2015 at 8:52 AM, Anoop John <anoop.hbase@gmail.com>
> > >> wrote:
> > >>
> > >> > >>I have Column Family with very large number of column qualifiers
> (>
> > >> > 50,000). Each column qualifier is 8 bytes long.
> > >> >
> > >> > When u say u have 50000 qualifiers in a CF, means u will have those
> > many
> > >> > cells coming under that CF per row.  So am not getting what is the
> > >> > qualifier length limit as such coming. Per qualifier, you will have
> a
> > >> diff
> > >> > cell and its qualifier.
> > >> >
> > >> > -Anoop-
> > >> >
> > >> >
> > >> > On Thu, Sep 24, 2015 at 1:13 AM, Vladimir Rodionov <
> > >> vladrodionov@gmail.com
> > >> > >
> > >> > wrote:
> > >> >
> > >> > > Yes, the comment is incorrect.
> > >> > >
> > >> > > hbase.client.keyvalue.maxsize controls max key-value size, but
its
> > >> > > unlimited in a master (I was wrong about 1MB, this is probably
for
> > >> older
> > >> > > versions of HBase)
> > >> > >
> > >> > >
> > >> > > -Vlad
> > >> > >
> > >> > > On Wed, Sep 23, 2015 at 11:45 AM, Gaurav Agarwal <
> gaurav@arkin.net>
> > >> > wrote:
> > >> > >
> > >> > > > Thanks Vlad. Could you please point me the KV size setting
> > (default
> > >> > 1MB)?
> > >> > > > Just to make sure that I understand correct, are you suggesting
> > that
> > >> > the
> > >> > > > following comment is incorrect in Cell.java?
> > >> > > >
> > >> > > >  /**
> > >> > > >    * Contiguous raw bytes that may start at any index in
the
> > >> containing
> > >> > > > array. Max length is
> > >> > > >    * Short.MAX_VALUE which is 32,767 bytes.
> > >> > > >    * @return The array containing the qualifier bytes.
> > >> > > >    */
> > >> > > >   byte[] getQualifierArray();
> > >> > > >
> > >> > > > On Thu, Sep 24, 2015 at 12:10 AM, Gaurav Agarwal <
> > gaurav@arkin.net>
> > >> > > wrote:
> > >> > > >
> > >> > > > > Thanks Vlad. Could you please point me the KV size
setting
> > >> (default
> > >> > > 1MB)?
> > >> > > > > Just to make sure that I understand correct - the following
> > >> comment
> > >> > is
> > >> > > > > incorrect in Cell.java:
> > >> > > > >
> > >> > > > >  /**
> > >> > > > >    * Contiguous raw bytes that may start at any index
in the
> > >> > containing
> > >> > > > > array. Max length is
> > >> > > > >    * Short.MAX_VALUE which is 32,767 bytes.
> > >> > > > >    * @return The array containing the qualifier bytes.
> > >> > > > >    */
> > >> > > > >   byte[] getQualifierArray();
> > >> > > > >
> > >> > > > > On Wed, Sep 23, 2015 at 11:43 PM, Vladimir Rodionov
<
> > >> > > > > vladrodionov@gmail.com> wrote:
> > >> > > > >
> > >> > > > >> Check KeyValue class (Cell's implementation).
> > getQualifierArray()
> > >> > > > returns
> > >> > > > >> kv's backing array. There is no SHORT limit on
a size of this
> > >> array,
> > >> > > but
> > >> > > > >> there are other limits in  HBase - maximum KV size,
for
> > example,
> > >> > which
> > >> > > > is
> > >> > > > >> configurable, but, by default, is 1MB. Having 50K
qualifiers
> > is a
> > >> > bad
> > >> > > > >> idea.
> > >> > > > >> Consider redesigning your data model and use rowkey
instead.
> > >> > > > >>
> > >> > > > >> -Vlad
> > >> > > > >>
> > >> > > > >> On Wed, Sep 23, 2015 at 10:24 AM, Ted Yu <
> yuzhihong@gmail.com>
> > >> > wrote:
> > >> > > > >>
> > >> > > > >> > Please take a look at HBASE-11544 which is
in hbase 1.1
> > >> > > > >> >
> > >> > > > >> > Cheers
> > >> > > > >> >
> > >> > > > >> > On Wed, Sep 23, 2015 at 10:18 AM, Gaurav Agarwal
<
> > >> > gaurav@arkin.net>
> > >> > > > >> wrote:
> > >> > > > >> >
> > >> > > > >> > > Hi All,
> > >> > > > >> > >
> > >> > > > >> > > I have Column Family with very large
number of column
> > >> qualifiers
> > >> > > (>
> > >> > > > >> > > 50,000). Each column qualifier is 8 bytes
long. The
> problem
> > >> is
> > >> > the
> > >> > > > >> when I
> > >> > > > >> > > do a scan operation to fetch some rows,
the client side
> > Cell
> > >> > > object
> > >> > > > >> does
> > >> > > > >> > > not have enough space allocated in it
to hold all the
> > >> > > > columnQaulifiers
> > >> > > > >> > for
> > >> > > > >> > > a given row and hence I cannot read all
the columns back
> > for
> > >> a
> > >> > > given
> > >> > > > >> row.
> > >> > > > >> > >
> > >> > > > >> > > Please see the code snippet that I am
using:
> > >> > > > >> > >
> > >> > > > >> > >  final ResultScanner rs = htable.getScanner(scan);
> > >> > > > >> > >  for (Result row = rs.next(); row !=
null; row =
> > rs.next()) {
> > >> > > > >> > >     final Cell[] cells = row.rawCells();
> > >> > > > >> > >     if (cells != null) {
> > >> > > > >> > >         for (final Cell cell : cells)
{
> > >> > > > >> > >             final long c = Bytes.toLong(
> > >> > > > >> > >                     *cell.getQualifierArray()*,
> > >> > > > >> > cell.getQualifierOffset(),
> > >> > > > >> > > cell.getQualifierLength());
> > >> > > > >> > >             final long v =
> > Bytes.toLong(cell.getValueArray(),
> > >> > > > >> > > cell.getValueOffset());
> > >> > > > >> > >             points.put(c, v);
> > >> > > > >> > >         }
> > >> > > > >> > >     }
> > >> > > > >> > > }
> > >> > > > >> > >
> > >> > > > >> > > The cell.getQualifierArray() method says
that it's 'Max
> > >> length
> > >> > is
> > >> > > > >> > > Short.MAX_VALUE which is 32,767 bytes.'.
Hence it can
> only
> > >> hold
> > >> > > > around
> > >> > > > >> > > 4,000 columnQualfiers.
> > >> > > > >> > >
> > >> > > > >> > > Is there an alternate API that I should
be using or am I
> > >> missing
> > >> > > > some
> > >> > > > >> > > setting here? Note that in worst case
I need to read all
> > the
> > >> > > > >> > > columnQualifiers in a row and I may or
may not know a
> > subset
> > >> to
> > >> > > > fetch
> > >> > > > >> in
> > >> > > > >> > > advance.
> > >> > > > >> > >
> > >> > > > >> > > Even if this is not possible in a single
call, is there a
> > >> way to
> > >> > > > >> cursor
> > >> > > > >> > > through the columnQualifiers?
> > >> > > > >> > >
> > >> > > > >> > > I am presently using Hbase 0.96 client
but can switch to
> > >> Hbase
> > >> > 1.x
> > >> > > > if
> > >> > > > >> > there
> > >> > > > >> > > is an API in the newer version.
> > >> > > > >> > >
> > >> > > > >> > > --cheers, gaurav
> > >> > > > >> > >
> > >> > > > >> > > --
> > >> > > > >> > > --cheers, gaurav
> > >> > > > >> > >
> > >> > > > >> >
> > >> > > > >>
> > >> > > > >
> > >> > > > >
> > >> > > > >
> > >> > > > > --
> > >> > > > > --cheers, gaurav
> > >> > > > >
> > >> > > >
> > >> > > >
> > >> > > >
> > >> > > > --
> > >> > > > --cheers, gaurav
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >
> > >
> > >
> > > --
> > > --cheers, gaurav
> > >
> >
> >
> >
> > --
> > --cheers, gaurav
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message