[image: Mic Drop]
Since there are millions of files (with sizes from 1mb to 15mb), I would
like to store them in a sequence file. How do I store the location of each
of these files in HBase?
I see lots blogs and books talking about storing large files on HDFS and
storing file paths on HBase. But, I don't see any real examples. I was
wondering if anybody implemented this in production.
Looking forward for reply from the community experts. Thanks.
Regards,
Arun
On Sun, Feb 21, 2016 at 10:30 AM, Ted Yu <yuzhihong@gmail.com> wrote:
> For #1, please take a look
> at
> hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSClient.java
>
> e.g. the following methods:
>
> public DFSInputStream open(String src) throws IOException {
>
> public HdfsDataOutputStream append(final String src, final int
> buffersize,
>
> EnumSet<CreateFlag> flag, final Progressable progress,
>
> final FileSystem.Statistics statistics) throws IOException {
>
>
> Cheers
>
> On Wed, Feb 17, 2016 at 3:40 PM, Arun Patel <arunp.bigdata@gmail.com>
> wrote:
>
> > I would like to store large documents (over 100 MB) on HDFS and insert
> > metadata in HBase.
> >
> > 1) Users will use HBase REST API for PUT and GET requests for storing and
> > retrieving documents. In this case, how to PUT and GET documents to/from
> > HDFS?What are the recommended ways for storing and accessing document
> > to/from HDFS that provides optimum performance?
> >
> > Can you please share any sample code? or a Github project?
> >
> > 2) What are the performance issues I need to know?
> >
> > Regards,
> > Arun
> >
>
|