hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andy <lamfeel...@126.com>
Subject Re:how to use hadoop in real life?
Date Tue, 07 Jul 2009 12:20:43 GMT
Hi, befor you resubmit your program, please make sure your output path is "valid". The valid
condition depends on your outputformat. In the case of FileOutputFomrat, this means, your
output dir should not exist. So try to delete them first.
 
you have various methods to initiate your hadoop program. try to refer to some "classic" hadoop
programs, such as Nutch, a totally hadoop based search engine. Find how many ways that you
can deploy and run a hadoop program.

to my knowledge, you can submit your hadoop program anywhere, as long as you can access the
"master" machine through network.
 
what kind of report?
not a good idea to replace your database with hadoop storage. the design of hadoop aims the
high I/O utility when dealing a large scale of data. So, maybe it is not appropriate to treat
hadoop as distributed database. BTW, the hbase project may be useful for you.
this is simple, use the HDFS APIs, you can do any file operation on HDFS remotely.
Java is a powerful language for most hadoop programmer, again, try to get familiar with some
Java written hadoop projects, you will find it is convienent to do your things.
Best wishes
Song
在2009-07-06?20:25:49,"Shravan?Mahankali"?<shravan.mahankali@catalytic.com>?写道:
>Hi?Group,
>
>?
>
>Finally?I?have?written?a?sample?Mapred?program,?submitted?this?job?to?Hadoop
>and?got?the?expected?results.?Thanks?to?all?of?you!
>
>?
>
>Now?I?don't?have?an?idea?of?how?to?use?Hadoop?in?real?life?(am?sorry?if?am
>asking?wrong?question?at?wrong?time.!?(So,?am?right?;-)))?:
>
>?
>
>1)?If?I?re-submit?my?job,?Hadoop?responds?with?an?error?message?saying:
>org.apache.hadoop.mapred.FileAlreadyExistsException:?Output?directory
>hdfs://localhost:9000/user/root/impressions_output?already?exists
>
>2)?How?to?automatically?execute?Hadoop?jobs??let's?say?I?have?set?a?cron?job
>which?runs?various?Hadoop?jobs?at?specified?times.?Is?this?the?way?we?do?in
>Hadoop?world?
>
>3)?Can?I?submit?jobs?to?Hadoop?from?a?different?machine/?network/?domain?
>
>4)?I?would?like?to?generate?reports?from?the?data?collected?in?the?Hadoop.
>How?can?I?do?that?
>
>5)?Am?thinking?of?replacing?data?in?my?database?with?Hadoop?and?query?Hadoop
>for?various?information.?Is?this?correct?
>
>6)?How?can?I?access?analyzed?data?in?Hadoop?from?external?world,?external
>program?
>
>?
>
>NOTE:?I?would?like?to?use?Java?for?any?of?above?implementations.
>
>?
>
>Thanks?in?advance,
>
>Shravan?Kumar.?M?
>
>Catalytic?Software?Ltd.?[SEI-CMMI?Level?5?Company]
>
>-----------------------------
>
>This?email?and?any?files?transmitted?with?it?are?confidential?and?intended
>solely?for?the?use?of?the?individual?or?entity?to?whom?they?are?addressed.
>If?you?have?received?this?email?in?error?please?notify?the?system
>administrator?-??<mailto:netopshelpdesk@catalytic.com>
>netopshelpdesk@catalytic.com
>
>?
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message