hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ascot Moss <ascot.m...@gmail.com>
Subject Re: To Store Large Number of Video and Image files
Date Sun, 17 Apr 2016 06:36:08 GMT
Hi,

Yes, the files are immutable.

Regards


On Sun, Apr 17, 2016 at 12:25 PM, Vladimir Rodionov <vladrodionov@gmail.com>
wrote:

> >>  have a project that needs to store large number of image and video
> files,
> >>the file size varies from 10MB to 10GB, the initial number of files will
> be
> >>0.1 billion and would grow over 1 billion, what will be the practical
> >>recommendations to store and view these files?
> >>
> Files are immutable?
> Write small files  (less than 1 HDFS block) to large blob (combine them
> into single file), store large files
> directly to HDFS. Keep path index in HBase.
>
> If you need to delete files, mark them as deleted in HBase and run
> periodically GC job to perform real cleaning.
>
> -Vlad
>
> On Sat, Apr 16, 2016 at 7:35 PM, Ted Yu <yuzhihong@gmail.com> wrote:
>
> > There was HBASE-15370 for backport but it was decided not to backport the
> > feature.
> >
> > FYI
> >
> > On Sat, Apr 16, 2016 at 7:26 PM, Ascot Moss <ascot.moss@gmail.com>
> wrote:
> >
> > > Hi,
> > >
> > > About HBase-11339,
> > > "The size of the MOB data could not be very large, it better to keep
> the
> > > MOB size within 100KB and 10MB. Since MOB cells are written into the
> > > memstore before flushing, large MOB cells stress the memory in region
> > > servers."
> > >
> > > Can this be resolved if we provide more RAM in region servers? for
> > > instances, the servers in the cluster, each has 768GB RAM + 14 x 6T
> HDD.
> > >
> > > regards
> > >
> > >
> > >
> > > On Sun, Apr 17, 2016 at 9:56 AM, Ascot Moss <ascot.moss@gmail.com>
> > wrote:
> > >
> > > > Thanks Ted!
> > > >
> > > > Just visited HBASE-11339, its status is "resolved" however, it is for
> > > > "Fix Version : 2.0.0."
> > > > How to patch it to current HBase stable version (v1.1.4) ?
> > > >
> > > > About Fault Tolerance to DataCenter level, I am thinking HBase
> > > Replication
> > > > method to replicate HBase Tables to another cluster (backup one), is
> > > there
> > > > any real world reference about the replication performance, for
> > instances
> > > > if the bandwidth is 100MB/s?
> > > >
> > > > Regards
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >    -
> > > >
> > > >
> > > > On Sun, Apr 17, 2016 at 9:40 AM, Ted Yu <yuzhihong@gmail.com> wrote:
> > > >
> > > >> Have you taken a look at HBASE-11339 (HBase MOB) ?
> > > >>
> > > >> Note: this feature does not handle 10GB objects well. Consider store
> > GB
> > > >> image on hdfs.
> > > >>
> > > >> Cheers
> > > >>
> > > >> On Sat, Apr 16, 2016 at 6:21 PM, Ascot Moss <ascot.moss@gmail.com>
> > > wrote:
> > > >>
> > > >> > Hi,
> > > >> >
> > > >> > I have a project that needs to store large number of image and
> video
> > > >> files,
> > > >> > the file size varies from 10MB to 10GB, the initial number of
> files
> > > >> will be
> > > >> > 0.1 billion and would grow over 1 billion, what will be the
> > practical
> > > >> > recommendations to store and view these files?
> > > >> >
> > > >> >
> > > >> >
> > > >> > #1 One cluster, store the HDFS URL in HBase and store the actual
> > file
> > > in
> > > >> > HDFS? (block_size as 128MB and replication factor as 3)
> > > >> >
> > > >> >
> > > >> > #2 One cluster, Store small files in HBase directly and use #1
for
> > > large
> > > >> > files? (block_size as 128MB and replication factor as 3)
> > > >> >
> > > >> >
> > > >> > #3 Multiple Hadoop/HBase clusters, each with different block_size
> > > >> settings?
> > > >> >
> > > >> >
> > > >> >      e.g. cluster 1 (small): block_size as 128MB and replication
> > > factor
> > > >> as
> > > >> > 3, store all files in HBase if their file size is smaller 128MB
> > > >> >
> > > >> >             cluster 2 (large): bigger block_size, say 4GB,
> > replication
> > > >> > factor as 3, store the HDFS URL in HBase and store the actual
file
> > in
> > > >> HDFS
> > > >> >
> > > >> >
> > > >> >
> > > >> > #4 Use Hadoop Federation for large number of files?
> > > >> >
> > > >> >
> > > >> > About Fault Tolerance, need to consider four types of failures:
> > > driver,
> > > >> > host, rack, and  datacenter failures.
> > > >> >
> > > >> >
> > > >> > Regards
> > > >> >
> > > >>
> > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message