hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Young Y Kim <yoon...@gmail.com>
Subject Re: HBase Is So Slow To Save Data?
Date Thu, 30 Aug 2012 08:44:54 GMT
In my experience , insert data under 15k/s per region server to avoid gc,
compaction.

On Thu, Aug 30, 2012 at 1:45 AM, Bing Li <lblabs@gmail.com> wrote:

> Dear Cristofer,
>
> Thanks so much for your reminding!
>
> Best regards,
> Bing
>
> On Thu, Aug 30, 2012 at 12:32 AM, Cristofer Weber <
> cristofer.weber@neogrid.com> wrote:
>
> > There's also a lot of conversions from same values to byte array
> > representation, eg, your NeighborStructure constants. You should do this
> > conversion only once to save time, since you are doing this inside 3
> nested
> > loops. Not sure about how much this can improve, but you should try this
> > also.
> >
> > Best regards,
> > Cristofer
> >
> > -----Mensagem original-----
> > De: Bing Li [mailto:lblabs@gmail.com]
> > Enviada em: quarta-feira, 29 de agosto de 2012 13:07
> > Para: user@hbase.apache.org
> > Cc: hbase-user@hadoop.apache.org
> > Assunto: Re: HBase Is So Slow To Save Data?
> >
> > I see. Thanks so much!
> >
> > Bing
> >
> >
> > On Wed, Aug 29, 2012 at 11:59 PM, N Keywal <nkeywal@gmail.com> wrote:
> >
> > > It's not useful here: if you have a memory issue, it's when your using
> > > the list, not when you have finished with it and set it to null.
> > > You need to monitor the memory consumption of the jvm, both the client
> > > & the server.
> > > Google around these keywords, there are many examples on the web.
> > > Google as well arrayList initialization.
> > >
> > > Note as well that the important is not the memory size of the
> > > structure on disk but the size of the" List<Put> puts = new
> > > ArrayList<Put>();" before the table put.
> > >
> > > On Wed, Aug 29, 2012 at 5:42 PM, Bing Li <lblabs@gmail.com> wrote:
> > >
> > > > Dear N Keywal,
> > > >
> > > > Thanks so much for your reply!
> > > >
> > > > The total amount of data is about 110M. The available memory is
> > > > enough,
> > > 2G.
> > > >
> > > > In Java, I just set a collection to NULL to collect garbage. Do you
> > > > think it is fine?
> > > >
> > > > Best regards,
> > > > Bing
> > > >
> > > >
> > > > On Wed, Aug 29, 2012 at 11:22 PM, N Keywal <nkeywal@gmail.com>
> wrote:
> > > >
> > > >> Hi Bing,
> > > >>
> > > >> You should expect HBase to be slower in the generic case:
> > > >> 1) it writes much more data (see hbase data model), with extra
> > > >> columns qualifiers, timestamps & so on.
> > > >> 2) the data is written multiple times: once in the write-ahead-log,
> > > >> once per replica on datanode & so on again.
> > > >> 3) there are inter process calls & inter machine calls on the
> > > >> critical path.
> > > >>
> > > >> This is the cost of the atomicity, reliability and scalability
> > features.
> > > >> With these features in mind, HBase is reasonably fast to save data
> > > >> on a cluster.
> > > >>
> > > >> On your specific case (without the points 2 & 3 above), the
> > > >> performance seems to be very bad.
> > > >>
> > > >> You should first look at:
> > > >> - how much is spent in the put vs. preparing the list
> > > >> - do you have garbage collection going on? even swap?
> > > >> - what's the size of your final Array vs. the available memory?
> > > >>
> > > >> Cheers,
> > > >>
> > > >> N.
> > > >>
> > > >>
> > > >>
> > > >> On Wed, Aug 29, 2012 at 4:08 PM, Bing Li <lblabs@gmail.com>
wrote:
> > > >>
> > > >>> Dear all,
> > > >>>
> > > >>> By the way, my HBase is in the pseudo-distributed mode. Thanks!
> > > >>>
> > > >>> Best regards,
> > > >>> Bing
> > > >>>
> > > >>> On Wed, Aug 29, 2012 at 10:04 PM, Bing Li <lblabs@gmail.com>
> wrote:
> > > >>>
> > > >>> > Dear all,
> > > >>> >
> > > >>> > According to my experiences, it is very slow for HBase to
save
> > data?
> > > >>> Am I
> > > >>> > right?
> > > >>> >
> > > >>> > For example, today I need to save data in a HashMap to HBase.
It
> > > >>> > took about more than three hours. However when saving the
same
> > > >>> > HashMap in
> > > a
> > > >>> file
> > > >>> > in the text format with the redirected System.out, it took
only
> > > >>> > 4.5
> > > >>> seconds!
> > > >>> >
> > > >>> > Why is HBase so slow? It is indexing?
> > > >>> >
> > > >>> > My code to save data in HBase is as follows. I think the
code
> > > >>> > must be correct.
> > > >>> >
> > > >>> >         ......
> > > >>> >         public synchronized void
> > > >>> > AddVirtualOutgoingHHNeighbors(ConcurrentHashMap<String,
> > > >>> > ConcurrentHashMap<String, Set<String>>> hhOutNeighborMap,
int
> > > >>> timingScale)
> > > >>> >         {
> > > >>> >                 List<Put> puts = new ArrayList<Put>();
> > > >>> >
> > > >>> >                 String hhNeighborRowKey;
> > > >>> >                 Put hubKeyPut;
> > > >>> >                 Put groupKeyPut;
> > > >>> >                 Put topGroupKeyPut;
> > > >>> >                 Put timingScalePut;
> > > >>> >                 Put nodeKeyPut;
> > > >>> >                 Put hubNeighborTypePut;
> > > >>> >
> > > >>> >                 for (Map.Entry<String, ConcurrentHashMap<String,
> > > >>> > Set<String>>> sourceHubGroupNeighborEntry :
> > > >>> hhOutNeighborMap.entrySet())
> > > >>> >                 {
> > > >>> >                         for (Map.Entry<String, Set<String>>
> > > >>> > groupNeighborEntry :
> > > sourceHubGroupNeighborEntry.getValue().entrySet())
> > > >>> >                         {
> > > >>> >                                 for (String neighborKey :
> > > >>> > groupNeighborEntry.getValue())
> > > >>> >                                 {
> > > >>> >                                         hhNeighborRowKey
=
> > > >>> > NeighborStructure.HUB_HUB_NEIGHBOR_ROW +
> > > >>> > Tools.GetAHash(sourceHubGroupNeighborEntry.getKey() +
> > > >>> > groupNeighborEntry.getKey() + timingScale + neighborKey);
> > > >>> >
> > > >>> >                                         hubKeyPut = new
> > > >>> > Put(Bytes.toBytes(hhNeighborRowKey));
> > > >>> >
> > > >>> >
> > > hubKeyPut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FAMILY)
> > > ,
> > > >>> > Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_HUB_KEY_COLUMN)
> > > >>> > , Bytes.toBytes(sourceHubGroupNeighborEntry.getKey()));
> > > >>> >                                         puts.add(hubKeyPut);
> > > >>> >
> > > >>> >                                         groupKeyPut = new
> > > >>> > Put(Bytes.toBytes(hhNeighborRowKey));
> > > >>> >
> > > >>> >
> > > >>>
> > > groupKeyPut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FAMIL
> > > Y),
> > > >>> > Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_GROUP_KEY_COLUM
> > > >>> > N), Bytes.toBytes(groupNeighborEntry.getKey()));
> > > >>> >                                         puts.add(groupKeyPut);
> > > >>> >
> > > >>> >                                         topGroupKeyPut =
new
> > > >>> > Put(Bytes.toBytes(hhNeighborRowKey));
> > > >>> >
> > > >>> >
> > > >>>
> > > topGroupKeyPut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FA
> > > MILY),
> > > >>> >
> > > Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_TOP_GROUP_KEY_COLUMN)
> > > ,
> > > >>> >
> > > >>>
> > > Bytes.toBytes(GroupRegistry.WWW().GetParentGroupKey(groupNeighborEntry
> > > .getKey())));
> > > >>> >
> > > >>> > puts.add(topGroupKeyPut);
> > > >>> >
> > > >>> >                                         timingScalePut =
new
> > > >>> > Put(Bytes.toBytes(hhNeighborRowKey));
> > > >>> >
> > > >>> >
> > > >>>
> > > timingScalePut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FA
> > > MILY),
> > > >>> >
> > > Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_TIMING_SCALE_COLUMN),
> > > >>> > Bytes.toBytes(timingScale));
> > > >>> >
> > > >>> > puts.add(timingScalePut);
> > > >>> >
> > > >>> >                                         nodeKeyPut = new
> > > >>> > Put(Bytes.toBytes(hhNeighborRowKey));
> > > >>> >
> > > >>> >
> > > >>>
> > > nodeKeyPut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FAMILY
> > > ),
> > > >>> > Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_NODE_KEY_COLUMN
> > > >>> > ),
> > > >>> > Bytes.toBytes(neighborKey));
> > > >>> >                                         puts.add(nodeKeyPut);
> > > >>> >
> > > >>> >                                         hubNeighborTypePut
= new
> > > >>> > Put(Bytes.toBytes(hhNeighborRowKey));
> > > >>> >
> > > >>> >
> > > >>>
> > > hubNeighborTypePut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBO
> > > R_FAMILY),
> > > >>> > Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_TYPE_COLUMN),
> > > >>> > Bytes.toBytes(SocialRole.VIRTUAL_NEIGHBOR));
> > > >>> >
> > puts.add(hubNeighborTypePut);
> > > >>> >                                 }
> > > >>> >                         }
> > > >>> >                 }
> > > >>> >
> > > >>> >                 try
> > > >>> >                 {
> > > >>> >                         this.neighborTable.put(puts);
> > > >>> >                 }
> > > >>> >                 catch (IOException e)
> > > >>> >                 {
> > > >>> >                         e.printStackTrace();
> > > >>> >                 }
> > > >>> >         }
> > > >>> >         ......
> > > >>> >
> > > >>> > Thanks so much!
> > > >>> >
> > > >>> > Best regards,
> > > >>> > Bing
> > > >>> >
> > > >>>
> > > >>
> > > >>
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message