oodt-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sheryl John <shery...@gmail.com>
Subject Re: Problem happened when I tried to run the script "crawler_launcher"
Date Thu, 09 Aug 2012 15:46:39 GMT
Hi Yunhee,

What are the error messages you get while running the crawler?

I've faced similar issues with crawler when I tried out the first time too.
I went through the crawler user guide to understand the architecture and
then understood how it worked only after running crawler with several times
to ingest files.
I agree we need to update the guide and if you want to know about the
MetExtractorProductCrawler and AutoDetectProductCrawler, the wiki page that
I mentioned before will give you an idea how to get it working (It mentions
the config files that you need to write for the above two crawlers).



On Thu, Aug 9, 2012 at 6:27 AM, YunHee Kang <yunh.kang@gmail.com> wrote:

> Hi Chris,
>
> I got a bunch of error messages when running the crawler_launcher script.
> First off, I think I need to understand  how to a crawler works.
> Can I get some materials to help me write configuration files for
> crawler_launcher ?
>
> Honestly I am not familiar with Crawler.
> But I will try to file a JIRA issue to update the Crawler user guide.
>
> Thanks,
> Yunhee
>
>
>
> 2012/8/9 Mattmann, Chris A (388J) <chris.a.mattmann@jpl.nasa.gov>:
> > Hi YunHee,
> >
> > Sorry, we need to update the docs, that is for sure. Can you help
> > us remember by filing a JIRA issue to update the Crawler user
> > guide and to fix the URL there?
> >
> > As for crawlerId, yes it's obsolete, you can find the modern
> > 0.4 and 0.5-trunk options by running ./crawler_launcher -h
> >
> > Cheers,
> > Chris
> >
> > On Aug 7, 2012, at 7:03 AM, YunHee Kang wrote:
> >
> >> Hi Chris and Sheryl,
> >>
> >> I understood  my mistake after modifying a wrong URL with the "/".
> >> But there is the wrong  URL  that is used  as an option of
> >> crawler_launcher in the apache oodt
> >> homepage(http://oodt.apache.org/components/maven/crawler/user/).
> >> --filemgrUrl http://localhost:9000/ \
> >> So it made me confused.
> >>
> >> I tried to run the command mentioned below  according to  the home
> >> page of apache oodt.
> >> $ ./crawler_launcher --crawlerId MetExtractorProductCrawler
> >> ERROR: Invalid option: 'crawlerId'
> >>
> >> But the error described above  was occurred.
> >> Is the option 'crawlerid'  obsolete ?
> >>
> >> Thanks,
> >> Yunhee
> >>
> >>
> >> 2012/8/7 Mattmann, Chris A (388J) <chris.a.mattmann@jpl.nasa.gov>:
> >>> Perfect, Sheryl, my thoughts exactly.
> >>>
> >>> Cheers,
> >>> Chris
> >>>
> >>> On Aug 6, 2012, at 10:01 AM, Sheryl John wrote:
> >>>
> >>>> Hi Yunhee,
> >>>>
> >>>> Check out this OODT wiki for crawler :
> >>>> https://cwiki.apache.org/confluence/display/OODT/OODT+Crawler+Help
> >>>>
> >>>> Did you try giving 'http://localhost:8000' without the "/" in the
> end?
> >>>> Also, specify
> 'org.apache.oodt.cas.filemgr.datatransfer.LocalDataTransferFactory'
> >>>> for  'clientTransferer' option.
> >>>>
> >>>>
> >>>> On Mon, Aug 6, 2012 at 9:46 AM, YunHee Kang <yunh.kang@gmail.com>
> wrote:
> >>>>
> >>>>> Hi Chris,
> >>>>>
> >>>>> I got an error message when I tried to run crawler_launcher by using
> a
> >>>>> shell script. The error message may be caused by a  wrong URL of
> >>>>> filemgr.
> >>>>> $ ./crawler_launcher.sh
> >>>>> ERROR: Validation Failures: - Value 'http://localhost:8000/' is
not
> >>>>> allowed for option
> >>>>> [longOption='filemgrUrl',shortOption='fm',description='File Manager
> >>>>> URL'] - Allowed values = [http://.*:\d*]
> >>>>>
> >>>>> The following is the shell script that I wrote:
> >>>>> $ cat crawler_launcher.sh
> >>>>> #!/bin/sh
> >>>>> export STAGE_AREA=/home/yhkang/oodt-0.5/cas-pushpull/staging/TESL2CO2
> >>>>> ./crawler_launcher \
> >>>>>      -op --launchStdCrawler \
> >>>>>      --productPath $STAGE_AREA\
> >>>>>      --filemgrUrl http://localhost:8000/\
> >>>>>      --failureDir /tmp \
> >>>>>      --actionIds DeleteDataFile MoveDataFileToFailureDir Unique
\
> >>>>>      --metFileExtension tmp \
> >>>>>      --clientTransferer
> >>>>> org.apache.oodt.cas.filemgr.datatransfer.LocalDataTransferer
> >>>>>
> >>>>> I am wondering if there is a problem in the URL of the filemgr or
> elsewhere
> >>>>>
> >>>>> Thanks,
> >>>>> Yunhee
> >>>>>
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> -Sheryl
> >>>
> >>>
> >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>> Chris Mattmann, Ph.D.
> >>> Senior Computer Scientist
> >>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> >>> Office: 171-266B, Mailstop: 171-246
> >>> Email: chris.a.mattmann@nasa.gov
> >>> WWW:   http://sunset.usc.edu/~mattmann/
> >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>> Adjunct Assistant Professor, Computer Science Department
> >>> University of Southern California, Los Angeles, CA 90089 USA
> >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>>
> >
> >
> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > Chris Mattmann, Ph.D.
> > Senior Computer Scientist
> > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> > Office: 171-266B, Mailstop: 171-246
> > Email: chris.a.mattmann@nasa.gov
> > WWW:   http://sunset.usc.edu/~mattmann/
> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > Adjunct Assistant Professor, Computer Science Department
> > University of Southern California, Los Angeles, CA 90089 USA
> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >
>



-- 
-Sheryl

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message