hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: parallel scanning?
Date Sat, 06 Feb 2016 13:29:06 GMT
bq. we can write twice/multi-time with no problem

If you always write twice, the latency would go up. Yet, there is no
guarantee that one of the writes would be successful.

On Fri, Feb 5, 2016 at 9:48 PM, Jameson Li <hovlj.ei@gmail.com> wrote:

> ''By line, did you mean number of rows ?
>
> Yes, sorry for my poor English.
>
> ''In the above case, handling failed write (to the second table) becomes a
> bit tricky.
>
> Yes, But I think sometimes write question will can solve easier than read,
> and that sometimes we can write twice/multi-time with no problem(premise do
> not operate column timestamp)
>
>
>
>
> 2016-02-05 20:13 GMT+08:00 Ted Yu <yuzhihong@gmail.com>:
>
> > bq. when the result line is so much lines
> >
> > By line, did you mean number of rows ?
> >
> > bq. one table with rowkey as A_B_time, another as B_A_time
> >
> > In the above case, handling failed write (to the second table) becomes a
> > bit tricky.
> >
> > Cheers
> >
> > On Fri, Feb 5, 2016 at 12:08 AM, Jameson Li <hovlj.ei@gmail.com> wrote:
> >
> > > 2016-01-26 2:29 GMT+08:00 Henning Blohm <henning.blohm@zfabrik.de>:
> > >
> > > > I am looking for advise on an HBase mass data access optimization
> > > problem.
> > > >
> > >
> > > For multi-get and multi-scan:
> > > In my opion, multi-get(make less line) can work in realtime query, but
> > > multi-scan maybe work but it will let server busy easy and effect other
> > > small-query to a big query time.
> > > But multi-get's query time will not stable, when one of the region is
> > busy
> > > the whole time will up.
> > >
> > > For realtime and offline:
> > > watch your real query result, when the result line is so much lines,
> like
> > > Mbyte or 10Mbyte, it's quert time will not so good as miliseconds,
> > because
> > > of the network trans time. We must reduce the result lines or result
> > sizes
> > > or result columns. or it is not suit the real-realtime query.
> > > if actually need so much querys and so much big-szie results, suggest
> to
> > > work with offline and parallel, but not realtime, because also the
> server
> > > network-through will not work(1000M BIT NIC for 2M byte/qps, a server
> > just
> > > handler 50qps).
> > >
> > > if just the query issue(multi-scan and multi-get), I think we can waste
> > > store to up the query performance, just using an extra table(maybe will
> > > write twice) and using another schema, eg: one table with rowkey as
> > > A_B_time, another as B_A_time, when query B%, we just query table
> rowkey
> > > B_A_time that just one small-scan, and not need for query table row
> > > A_B_time with multi_scans.
> > >
> > > Hope helpful for U.
> > >
> > >
> > >
> > >
> > > --
> > >
> > >
> > > Thanks & Regards,
> > > 李剑 Jameson Li
> > > Focus on Hadoop,Mysql
> > >
> >
>
>
>
> --
>
>
> Thanks & Regards,
> 李剑 Jameson Li
> Focus on Hadoop,Mysql
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message