hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chanchal James" <chanc...@gmail.com>
Subject Re: Question about Hadoop
Date Thu, 12 Jun 2008 18:22:28 GMT
Haijun, I have most of the settings as default, but not tmp dir. I have the
tmp dir set to "/usr/local/hadoop/hadoop-datastore/hadoop-${user.name}". Is
this a good location ?


On Thu, Jun 12, 2008 at 12:59 PM, Haijun Cao <haijun@kindsight.net> wrote:

>
> "While testing I had to delete the temporary "datastore" folder and
> reformat
> the file system a couple of times."
>
> Is it because you leave hadoop.tmp.dir and other .dir parameter as
> default? Try to set hadoop.tmp.dir to a dir not under /tmp.
>
> <property>
>  <name>hadoop.tmp.dir</name>
>  <value>/tmp/hadoop-${user.name}</value>
>  <description>A base for other temporary directories.</description>
> </property>
>
> Dfs.name.dir is by default under ${hadoop.tmp.dir}/dfs/name:
>
> <property>
>  <name>dfs.name.dir</name>
>  <value>${hadoop.tmp.dir}/dfs/name</value>
> </property>
>
> Haijun
>
> -----Original Message-----
> From: Chanchal James [mailto:chanch13@gmail.com]
> Sent: Thursday, June 12, 2008 10:16 AM
> To: core-user@hadoop.apache.org
> Subject: Re: Question about Hadoop
>
> Thanks Lohit for the info. I have one more question.
> If I keep all data in HDFS, is there anyway I can back it up regularly.
> While testing I had to delete the temporary "datastore" folder and
> reformat
> the file system a couple of times. So while using Hadoop in a real
> environment, what are the chances of such software side uncorrectable
> problems to occur. Can we correct it without a reformat ? I cannot
> afford to
> loose the data I plan to put in HDFS.
>
> Thank you.
>
> On Thu, Jun 12, 2008 at 12:02 PM, lohit <lohit_bv@yahoo.com> wrote:
>
> > Ideally what you would want is your data to be on HDFS and run your
> > map/reduce jobs on that data. Hadoop framework splits you data and
> feeds in
> > those splits to each map or reduce task. One problem with Image files
> is
> > that you will not be able to split them. Alternatively people have
> done
> > this, they wrap Image files within xml and create huge files which has
> > multiple image files in them. Hadoop offers something called streaming
> using
> > which you will be able to split the files at xml boundry and feed it
> to your
> > map/reduce tasks. Streaming also enables you to use any code like
> > perl/php/c++.
> > Check info about streaming here
> > http://hadoop.apache.org/core/docs/r0.17.0/streaming.html
> > And information about parsing XML files in streaming in here
> >
> http://hadoop.apache.org/core/docs/r0.17.0/streaming.html#How+do+I+parse
> +XML+documents+using+streaming%3F<http://hadoop.apache.org/core/docs/r0.17.0/streaming.html#How+do+I+parse+XML+documents+using+streaming%3F>
> >
> > Thanks,
> > Lohit
> >
> > ----- Original Message ----
> > From: Chanchal James <chanch13@gmail.com>
> > To: core-user@hadoop.apache.org
> > Sent: Thursday, June 12, 2008 9:42:46 AM
> > Subject: Question about Hadoop
> >
> > Hi,
> >
> >
> >
> > I have a question about Hadoop. I am a beginner and just testing
> Hadoop.
> >
> > Would like to know how a php application would benefit from this, say
> an
> >
> > application that needs to work on a large number of image files. Do I
> have
> > to
> >
> > store the application in HDFS always, or do I just copy it to HDFS
> when
> >
> > needed, do the processing, and then copy it back to the local file
> system ?
> >
> > Is that the case with the data files too ? Once I have Hadoop running,
> do I
> >
> > keep all data & application files in HDFS always, and not use local
> file
> >
> > system storage ?
> >
> >
> >
> > Thank you.
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message