hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jack Levin <magn...@gmail.com>
Subject Re: Storing images in Hbase
Date Sun, 20 Jan 2013 19:49:38 GMT
I forgot to mention that I also have this setup:

<property>
  <name>hbase.hregion.memstore.flush.size</name>
  <value>33554432</value>
  <description>Flush more often. Default: 67108864</description>
</property>

This parameter works on per region amount, so this means if any of my
400 (currently) regions on a regionserver has 30MB+ in memstore, the
hbase will flush it to disk.


Here are some metrics from a regionserver:

requests=2, regions=370, stores=370, storefiles=1390,
storefileIndexSize=304, memstoreSize=2233, compactionQueueSize=0,
flushQueueSize=0, usedHeap=3516, maxHeap=4987,
blockCacheSize=790656256, blockCacheFree=255245888,
blockCacheCount=2436, blockCacheHitCount=218015828,
blockCacheMissCount=13514652, blockCacheEvictedCount=2561516,
blockCacheHitRatio=94, blockCacheHitCachingRatio=98

Note, that memstore is only 2G, this particular regionserver HEAP is set to 5G.

And last but not least, its very important to have good GC setup:

export HBASE_OPTS="$HBASE_OPTS -verbose:gc -Xms5000m
-XX:CMSInitiatingOccupancyFraction=70 -XX:+PrintGCDetails
-XX:+PrintGCDateStamps
-XX:+HeapDumpOnOutOfMemoryError -Xloggc:$HBASE_HOME/logs/gc-hbase.log \
-XX:MaxTenuringThreshold=15 -XX:SurvivorRatio=8 \
-XX:+UseParNewGC \
-XX:NewSize=128m -XX:MaxNewSize=128m \
-XX:-UseAdaptiveSizePolicy \
-XX:+CMSParallelRemarkEnabled \
-XX:-TraceClassUnloading
"

-Jack

On Thu, Jan 17, 2013 at 3:29 PM, Varun Sharma <varun@pinterest.com> wrote:
> Hey Jack,
>
> Thanks for the useful information. By flush size being 15 %, do you mean
> the memstore flush size ? 15 % would mean close to 1G, have you seen any
> issues with flushes taking too long ?
>
> Thanks
> Varun
>
> On Sun, Jan 13, 2013 at 8:17 AM, Jack Levin <magnito@gmail.com> wrote:
>
>> That's right, Memstore size , not flush size is increased.  Filesize is
>> 10G. Overall write cache is 60% of heap and read cache is 20%.  Flush size
>> is 15%.  64 maxlogs at 128MB. One namenode server, one secondary that can
>> be promoted.  On the way to hbase images are written to a queue, so that we
>> can take Hbase down for maintenance and still do inserts later.  ImageShack
>> has ‘perma cache’ servers that allows writes and serving of data even when
>> hbase is down for hours, consider it 4th replica 😉 outside of hadoop
>>
>> Jack
>>
>>  *From:* Mohit Anchlia <mohitanchlia@gmail.com>
>> *Sent:* ‎January‎ ‎13‎, ‎2013 ‎7‎:‎48‎ ‎AM
>> *To:* user@hbase.apache.org
>> *Subject:* Re: Storing images in Hbase
>>
>> Thanks Jack for sharing this information. This definitely makes sense when
>> using the type of caching layer. You mentioned about increasing write
>> cache, I am assuming you had to increase the following parameters in
>> addition to increase the memstore size:
>>
>> hbase.hregion.max.filesize
>> hbase.hregion.memstore.flush.size
>>
>> On Fri, Jan 11, 2013 at 9:47 AM, Jack Levin <magnito@gmail.com> wrote:
>>
>> > We buffer all accesses to HBASE with Varnish SSD based caching layer.
>> > So the impact for reads is negligible.  We have 70 node cluster, 8 GB
>> > of RAM per node, relatively weak nodes (intel core 2 duo), with
>> > 10-12TB per server of disks.  Inserting 600,000 images per day.  We
>> > have relatively little of compaction activity as we made our write
>> > cache much larger than read cache - so we don't experience region file
>> > fragmentation as much.
>> >
>> > -Jack
>> >
>> > On Fri, Jan 11, 2013 at 9:40 AM, Mohit Anchlia <mohitanchlia@gmail.com>
>> > wrote:
>> > > I think it really depends on volume of the traffic, data distribution
>> per
>> > > region, how and when files compaction occurs, number of nodes in the
>> > > cluster. In my experience when it comes to blob data where you are
>> > serving
>> > > 10s of thousand+ requests/sec writes and reads then it's very difficult
>> > to
>> > > manage HBase without very hard operations and maintenance in play. Jack
>> > > earlier mentioned they have 1 billion images, It would be interesting
>> to
>> > > know what they see in terms of compaction, no of requests per sec. I'd
>> be
>> > > surprised that in high volume site it can be done without any Caching
>> > layer
>> > > on the top to alleviate IO spikes that occurs because of GC and
>> > compactions.
>> > >
>> > > On Fri, Jan 11, 2013 at 7:27 AM, Mohammad Tariq <dontariq@gmail.com>
>> > wrote:
>> > >
>> > >> IMHO, if the image files are not too huge, Hbase can efficiently serve
>> > the
>> > >> purpose. You can store some additional info along with the file
>> > depending
>> > >> upon your search criteria to make the search faster. Say if you want
>> to
>> > >> fetch images by the type, you can store images in one column and its
>> > >> extension in another column(jpg, tiff etc).
>> > >>
>> > >> BTW, what exactly is the problem which you are facing. You have
>> written
>> > >> "But I still cant do it"?
>> > >>
>> > >> Warm Regards,
>> > >> Tariq
>> > >> https://mtariq.jux.com/
>> > >>
>> > >>
>> > >> On Fri, Jan 11, 2013 at 8:30 PM, Michael Segel <
>> > michael_segel@hotmail.com
>> > >> >wrote:
>> > >>
>> > >> > That's a viable option.
>> > >> > HDFS reads are faster than HBase, but it would require first hitting
>> > the
>> > >> > index in HBase which points to the file and then fetching the
file.
>> > >> > It could be faster... we found storing binary data in a sequence
>> file
>> > and
>> > >> > indexed on HBase to be faster than HBase, however, YMMV and HBase
>> has
>> > >> been
>> > >> > improved since we did that project....
>> > >> >
>> > >> >
>> > >> > On Jan 10, 2013, at 10:56 PM, shashwat shriparv <
>> > >> dwivedishashwat@gmail.com>
>> > >> > wrote:
>> > >> >
>> > >> > > Hi Kavish,
>> > >> > >
>> > >> > > i have a better idea for you copy your image files to a single
>> file
>> > on
>> > >> > > hdfs, and if new image comes append it to the existing image,
and
>> > keep
>> > >> > and
>> > >> > > update the metadata and the offset to the HBase. Because
if you
>> put
>> > >> > bigger
>> > >> > > image in hbase it wil lead to some issue.
>> > >> > >
>> > >> > >
>> > >> > >
>> > >> > > ∞
>> > >> > > Shashwat Shriparv
>> > >> > >
>> > >> > >
>> > >> > >
>> > >> > > On Fri, Jan 11, 2013 at 9:21 AM, lars hofhansl <larsh@apache.org>
>> > >> wrote:
>> > >> > >
>> > >> > >> Interesting. That's close to a PB if my math is correct.
>> > >> > >> Is there a write up about this somewhere? Something that
we could
>> > link
>> > >> > >> from the HBase homepage?
>> > >> > >>
>> > >> > >> -- Lars
>> > >> > >>
>> > >> > >>
>> > >> > >> ----- Original Message -----
>> > >> > >> From: Jack Levin <magnito@gmail.com>
>> > >> > >> To: user@hbase.apache.org
>> > >> > >> Cc: Andrew Purtell <apurtell@apache.org>
>> > >> > >> Sent: Thursday, January 10, 2013 9:24 AM
>> > >> > >> Subject: Re: Storing images in Hbase
>> > >> > >>
>> > >> > >> We stored about 1 billion images into hbase with file
size up to
>> > 10MB.
>> > >> > >> Its been running for close to 2 years without issues
and serves
>> > >> > >> delivery of images for Yfrog and ImageShack.  If you
have any
>> > >> > >> questions about the setup, I would be glad to answer
them.
>> > >> > >>
>> > >> > >> -Jack
>> > >> > >>
>> > >> > >> On Sun, Jan 6, 2013 at 1:09 PM, Mohit Anchlia <
>> > mohitanchlia@gmail.com
>> > >> >
>> > >> > >> wrote:
>> > >> > >>> I have done extensive testing and have found that
blobs don't
>> > belong
>> > >> in
>> > >> > >> the
>> > >> > >>> databases but are rather best left out on the file
system.
>> Andrew
>> > >> > >> outlined
>> > >> > >>> issues that you'll face and not to mention IO issues
when
>> > compaction
>> > >> > >> occurs
>> > >> > >>> over large files.
>> > >> > >>>
>> > >> > >>> On Sun, Jan 6, 2013 at 12:52 PM, Andrew Purtell <
>> > apurtell@apache.org
>> > >> >
>> > >> > >> wrote:
>> > >> > >>>
>> > >> > >>>> I meant this to say "a few really large values"
>> > >> > >>>>
>> > >> > >>>> On Sun, Jan 6, 2013 at 12:49 PM, Andrew Purtell
<
>> > >> apurtell@apache.org>
>> > >> > >>>> wrote:
>> > >> > >>>>
>> > >> > >>>>> Consider if the split threshold is 2 GB but
your one row
>> > contains
>> > >> 10
>> > >> > >> GB
>> > >> > >>>> as
>> > >> > >>>>> really large value.
>> > >> > >>>>
>> > >> > >>>>
>> > >> > >>>>
>> > >> > >>>>
>> > >> > >>>> --
>> > >> > >>>> Best regards,
>> > >> > >>>>
>> > >> > >>>>   - Andy
>> > >> > >>>>
>> > >> > >>>> Problems worthy of attack prove their worth by
hitting back. -
>> > Piet
>> > >> > Hein
>> > >> > >>>> (via Tom White)
>> > >> > >>>>
>> > >> > >>
>> > >> > >>
>> > >> >
>> > >> >
>> > >>
>> >
>>

Mime
View raw message