hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cristofer Weber <cristofer.we...@neogrid.com>
Subject RES: HBase Is So Slow To Save Data?
Date Wed, 29 Aug 2012 16:32:47 GMT
There's also a lot of conversions from same values to byte array representation, eg, your NeighborStructure
constants. You should do this conversion only once to save time, since you are doing this
inside 3 nested loops. Not sure about how much this can improve, but you should try this also.

Best regards,
Cristofer

-----Mensagem original-----
De: Bing Li [mailto:lblabs@gmail.com] 
Enviada em: quarta-feira, 29 de agosto de 2012 13:07
Para: user@hbase.apache.org
Cc: hbase-user@hadoop.apache.org
Assunto: Re: HBase Is So Slow To Save Data?

I see. Thanks so much!

Bing


On Wed, Aug 29, 2012 at 11:59 PM, N Keywal <nkeywal@gmail.com> wrote:

> It's not useful here: if you have a memory issue, it's when your using 
> the list, not when you have finished with it and set it to null.
> You need to monitor the memory consumption of the jvm, both the client 
> & the server.
> Google around these keywords, there are many examples on the web.
> Google as well arrayList initialization.
>
> Note as well that the important is not the memory size of the 
> structure on disk but the size of the" List<Put> puts = new 
> ArrayList<Put>();" before the table put.
>
> On Wed, Aug 29, 2012 at 5:42 PM, Bing Li <lblabs@gmail.com> wrote:
>
> > Dear N Keywal,
> >
> > Thanks so much for your reply!
> >
> > The total amount of data is about 110M. The available memory is 
> > enough,
> 2G.
> >
> > In Java, I just set a collection to NULL to collect garbage. Do you 
> > think it is fine?
> >
> > Best regards,
> > Bing
> >
> >
> > On Wed, Aug 29, 2012 at 11:22 PM, N Keywal <nkeywal@gmail.com> wrote:
> >
> >> Hi Bing,
> >>
> >> You should expect HBase to be slower in the generic case:
> >> 1) it writes much more data (see hbase data model), with extra 
> >> columns qualifiers, timestamps & so on.
> >> 2) the data is written multiple times: once in the write-ahead-log, 
> >> once per replica on datanode & so on again.
> >> 3) there are inter process calls & inter machine calls on the 
> >> critical path.
> >>
> >> This is the cost of the atomicity, reliability and scalability features.
> >> With these features in mind, HBase is reasonably fast to save data 
> >> on a cluster.
> >>
> >> On your specific case (without the points 2 & 3 above), the 
> >> performance seems to be very bad.
> >>
> >> You should first look at:
> >> - how much is spent in the put vs. preparing the list
> >> - do you have garbage collection going on? even swap?
> >> - what's the size of your final Array vs. the available memory?
> >>
> >> Cheers,
> >>
> >> N.
> >>
> >>
> >>
> >> On Wed, Aug 29, 2012 at 4:08 PM, Bing Li <lblabs@gmail.com> wrote:
> >>
> >>> Dear all,
> >>>
> >>> By the way, my HBase is in the pseudo-distributed mode. Thanks!
> >>>
> >>> Best regards,
> >>> Bing
> >>>
> >>> On Wed, Aug 29, 2012 at 10:04 PM, Bing Li <lblabs@gmail.com> wrote:
> >>>
> >>> > Dear all,
> >>> >
> >>> > According to my experiences, it is very slow for HBase to save data?
> >>> Am I
> >>> > right?
> >>> >
> >>> > For example, today I need to save data in a HashMap to HBase. It 
> >>> > took about more than three hours. However when saving the same 
> >>> > HashMap in
> a
> >>> file
> >>> > in the text format with the redirected System.out, it took only 
> >>> > 4.5
> >>> seconds!
> >>> >
> >>> > Why is HBase so slow? It is indexing?
> >>> >
> >>> > My code to save data in HBase is as follows. I think the code 
> >>> > must be correct.
> >>> >
> >>> >         ......
> >>> >         public synchronized void 
> >>> > AddVirtualOutgoingHHNeighbors(ConcurrentHashMap<String,
> >>> > ConcurrentHashMap<String, Set<String>>> hhOutNeighborMap,
int
> >>> timingScale)
> >>> >         {
> >>> >                 List<Put> puts = new ArrayList<Put>();
> >>> >
> >>> >                 String hhNeighborRowKey;
> >>> >                 Put hubKeyPut;
> >>> >                 Put groupKeyPut;
> >>> >                 Put topGroupKeyPut;
> >>> >                 Put timingScalePut;
> >>> >                 Put nodeKeyPut;
> >>> >                 Put hubNeighborTypePut;
> >>> >
> >>> >                 for (Map.Entry<String, ConcurrentHashMap<String,

> >>> > Set<String>>> sourceHubGroupNeighborEntry :
> >>> hhOutNeighborMap.entrySet())
> >>> >                 {
> >>> >                         for (Map.Entry<String, Set<String>>

> >>> > groupNeighborEntry :
> sourceHubGroupNeighborEntry.getValue().entrySet())
> >>> >                         {
> >>> >                                 for (String neighborKey :
> >>> > groupNeighborEntry.getValue())
> >>> >                                 {
> >>> >                                         hhNeighborRowKey = 
> >>> > NeighborStructure.HUB_HUB_NEIGHBOR_ROW +
> >>> > Tools.GetAHash(sourceHubGroupNeighborEntry.getKey() +
> >>> > groupNeighborEntry.getKey() + timingScale + neighborKey);
> >>> >
> >>> >                                         hubKeyPut = new 
> >>> > Put(Bytes.toBytes(hhNeighborRowKey));
> >>> >
> >>> >
> hubKeyPut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FAMILY)
> ,
> >>> > Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_HUB_KEY_COLUMN)
> >>> > , Bytes.toBytes(sourceHubGroupNeighborEntry.getKey()));
> >>> >                                         puts.add(hubKeyPut);
> >>> >
> >>> >                                         groupKeyPut = new 
> >>> > Put(Bytes.toBytes(hhNeighborRowKey));
> >>> >
> >>> >
> >>>
> groupKeyPut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FAMIL
> Y),
> >>> > Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_GROUP_KEY_COLUM
> >>> > N), Bytes.toBytes(groupNeighborEntry.getKey()));
> >>> >                                         puts.add(groupKeyPut);
> >>> >
> >>> >                                         topGroupKeyPut = new 
> >>> > Put(Bytes.toBytes(hhNeighborRowKey));
> >>> >
> >>> >
> >>>
> topGroupKeyPut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FA
> MILY),
> >>> >
> Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_TOP_GROUP_KEY_COLUMN)
> ,
> >>> >
> >>>
> Bytes.toBytes(GroupRegistry.WWW().GetParentGroupKey(groupNeighborEntry
> .getKey())));
> >>> >                                         
> >>> > puts.add(topGroupKeyPut);
> >>> >
> >>> >                                         timingScalePut = new 
> >>> > Put(Bytes.toBytes(hhNeighborRowKey));
> >>> >
> >>> >
> >>>
> timingScalePut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FA
> MILY),
> >>> >
> Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_TIMING_SCALE_COLUMN),
> >>> > Bytes.toBytes(timingScale));
> >>> >                                         
> >>> > puts.add(timingScalePut);
> >>> >
> >>> >                                         nodeKeyPut = new 
> >>> > Put(Bytes.toBytes(hhNeighborRowKey));
> >>> >
> >>> >
> >>>
> nodeKeyPut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FAMILY
> ),
> >>> > Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_NODE_KEY_COLUMN
> >>> > ),
> >>> > Bytes.toBytes(neighborKey));
> >>> >                                         puts.add(nodeKeyPut);
> >>> >
> >>> >                                         hubNeighborTypePut = new 
> >>> > Put(Bytes.toBytes(hhNeighborRowKey));
> >>> >
> >>> >
> >>>
> hubNeighborTypePut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBO
> R_FAMILY),
> >>> > Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_TYPE_COLUMN),
> >>> > Bytes.toBytes(SocialRole.VIRTUAL_NEIGHBOR));
> >>> >                                         puts.add(hubNeighborTypePut);
> >>> >                                 }
> >>> >                         }
> >>> >                 }
> >>> >
> >>> >                 try
> >>> >                 {
> >>> >                         this.neighborTable.put(puts);
> >>> >                 }
> >>> >                 catch (IOException e)
> >>> >                 {
> >>> >                         e.printStackTrace();
> >>> >                 }
> >>> >         }
> >>> >         ......
> >>> >
> >>> > Thanks so much!
> >>> >
> >>> > Best regards,
> >>> > Bing
> >>> >
> >>>
> >>
> >>
> >
>

Mime
View raw message