hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bing Li <lbl...@gmail.com>
Subject Re: HBase Is So Slow To Save Data?
Date Wed, 29 Aug 2012 16:07:19 GMT
I see. Thanks so much!

Bing


On Wed, Aug 29, 2012 at 11:59 PM, N Keywal <nkeywal@gmail.com> wrote:

> It's not useful here: if you have a memory issue, it's when your using the
> list, not when you have finished with it and set it to null.
> You need to monitor the memory consumption of the jvm, both the client &
> the server.
> Google around these keywords, there are many examples on the web.
> Google as well arrayList initialization.
>
> Note as well that the important is not the memory size of the structure on
> disk but the size of the" List<Put> puts = new ArrayList<Put>();" before
> the table put.
>
> On Wed, Aug 29, 2012 at 5:42 PM, Bing Li <lblabs@gmail.com> wrote:
>
> > Dear N Keywal,
> >
> > Thanks so much for your reply!
> >
> > The total amount of data is about 110M. The available memory is enough,
> 2G.
> >
> > In Java, I just set a collection to NULL to collect garbage. Do you think
> > it is fine?
> >
> > Best regards,
> > Bing
> >
> >
> > On Wed, Aug 29, 2012 at 11:22 PM, N Keywal <nkeywal@gmail.com> wrote:
> >
> >> Hi Bing,
> >>
> >> You should expect HBase to be slower in the generic case:
> >> 1) it writes much more data (see hbase data model), with extra columns
> >> qualifiers, timestamps & so on.
> >> 2) the data is written multiple times: once in the write-ahead-log, once
> >> per replica on datanode & so on again.
> >> 3) there are inter process calls & inter machine calls on the critical
> >> path.
> >>
> >> This is the cost of the atomicity, reliability and scalability features.
> >> With these features in mind, HBase is reasonably fast to save data on a
> >> cluster.
> >>
> >> On your specific case (without the points 2 & 3 above), the performance
> >> seems to be very bad.
> >>
> >> You should first look at:
> >> - how much is spent in the put vs. preparing the list
> >> - do you have garbage collection going on? even swap?
> >> - what's the size of your final Array vs. the available memory?
> >>
> >> Cheers,
> >>
> >> N.
> >>
> >>
> >>
> >> On Wed, Aug 29, 2012 at 4:08 PM, Bing Li <lblabs@gmail.com> wrote:
> >>
> >>> Dear all,
> >>>
> >>> By the way, my HBase is in the pseudo-distributed mode. Thanks!
> >>>
> >>> Best regards,
> >>> Bing
> >>>
> >>> On Wed, Aug 29, 2012 at 10:04 PM, Bing Li <lblabs@gmail.com> wrote:
> >>>
> >>> > Dear all,
> >>> >
> >>> > According to my experiences, it is very slow for HBase to save data?
> >>> Am I
> >>> > right?
> >>> >
> >>> > For example, today I need to save data in a HashMap to HBase. It took
> >>> > about more than three hours. However when saving the same HashMap in
> a
> >>> file
> >>> > in the text format with the redirected System.out, it took only 4.5
> >>> seconds!
> >>> >
> >>> > Why is HBase so slow? It is indexing?
> >>> >
> >>> > My code to save data in HBase is as follows. I think the code must
be
> >>> > correct.
> >>> >
> >>> >         ......
> >>> >         public synchronized void
> >>> > AddVirtualOutgoingHHNeighbors(ConcurrentHashMap<String,
> >>> > ConcurrentHashMap<String, Set<String>>> hhOutNeighborMap,
int
> >>> timingScale)
> >>> >         {
> >>> >                 List<Put> puts = new ArrayList<Put>();
> >>> >
> >>> >                 String hhNeighborRowKey;
> >>> >                 Put hubKeyPut;
> >>> >                 Put groupKeyPut;
> >>> >                 Put topGroupKeyPut;
> >>> >                 Put timingScalePut;
> >>> >                 Put nodeKeyPut;
> >>> >                 Put hubNeighborTypePut;
> >>> >
> >>> >                 for (Map.Entry<String, ConcurrentHashMap<String,
> >>> > Set<String>>> sourceHubGroupNeighborEntry :
> >>> hhOutNeighborMap.entrySet())
> >>> >                 {
> >>> >                         for (Map.Entry<String, Set<String>>
> >>> > groupNeighborEntry :
> sourceHubGroupNeighborEntry.getValue().entrySet())
> >>> >                         {
> >>> >                                 for (String neighborKey :
> >>> > groupNeighborEntry.getValue())
> >>> >                                 {
> >>> >                                         hhNeighborRowKey =
> >>> > NeighborStructure.HUB_HUB_NEIGHBOR_ROW +
> >>> > Tools.GetAHash(sourceHubGroupNeighborEntry.getKey() +
> >>> > groupNeighborEntry.getKey() + timingScale + neighborKey);
> >>> >
> >>> >                                         hubKeyPut = new
> >>> > Put(Bytes.toBytes(hhNeighborRowKey));
> >>> >
> >>> >
> hubKeyPut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FAMILY),
> >>> > Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_HUB_KEY_COLUMN),
> >>> > Bytes.toBytes(sourceHubGroupNeighborEntry.getKey()));
> >>> >                                         puts.add(hubKeyPut);
> >>> >
> >>> >                                         groupKeyPut = new
> >>> > Put(Bytes.toBytes(hhNeighborRowKey));
> >>> >
> >>> >
> >>>
> groupKeyPut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FAMILY),
> >>> > Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_GROUP_KEY_COLUMN),
> >>> > Bytes.toBytes(groupNeighborEntry.getKey()));
> >>> >                                         puts.add(groupKeyPut);
> >>> >
> >>> >                                         topGroupKeyPut = new
> >>> > Put(Bytes.toBytes(hhNeighborRowKey));
> >>> >
> >>> >
> >>>
> topGroupKeyPut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FAMILY),
> >>> >
> Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_TOP_GROUP_KEY_COLUMN),
> >>> >
> >>>
> Bytes.toBytes(GroupRegistry.WWW().GetParentGroupKey(groupNeighborEntry.getKey())));
> >>> >                                         puts.add(topGroupKeyPut);
> >>> >
> >>> >                                         timingScalePut = new
> >>> > Put(Bytes.toBytes(hhNeighborRowKey));
> >>> >
> >>> >
> >>>
> timingScalePut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FAMILY),
> >>> >
> Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_TIMING_SCALE_COLUMN),
> >>> > Bytes.toBytes(timingScale));
> >>> >                                         puts.add(timingScalePut);
> >>> >
> >>> >                                         nodeKeyPut = new
> >>> > Put(Bytes.toBytes(hhNeighborRowKey));
> >>> >
> >>> >
> >>>
> nodeKeyPut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FAMILY),
> >>> > Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_NODE_KEY_COLUMN),
> >>> > Bytes.toBytes(neighborKey));
> >>> >                                         puts.add(nodeKeyPut);
> >>> >
> >>> >                                         hubNeighborTypePut = new
> >>> > Put(Bytes.toBytes(hhNeighborRowKey));
> >>> >
> >>> >
> >>>
> hubNeighborTypePut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FAMILY),
> >>> > Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_TYPE_COLUMN),
> >>> > Bytes.toBytes(SocialRole.VIRTUAL_NEIGHBOR));
> >>> >                                         puts.add(hubNeighborTypePut);
> >>> >                                 }
> >>> >                         }
> >>> >                 }
> >>> >
> >>> >                 try
> >>> >                 {
> >>> >                         this.neighborTable.put(puts);
> >>> >                 }
> >>> >                 catch (IOException e)
> >>> >                 {
> >>> >                         e.printStackTrace();
> >>> >                 }
> >>> >         }
> >>> >         ......
> >>> >
> >>> > Thanks so much!
> >>> >
> >>> > Best regards,
> >>> > Bing
> >>> >
> >>>
> >>
> >>
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message