oodt-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From YunHee Kang <yunh.k...@gmail.com>
Subject Re: Problem happened when I tried to run the script "crawler_launcher"
Date Fri, 10 Aug 2012 03:19:50 GMT
Hi Sheryl,

First off, I tried to run crawler_launcher with an option "-autoPC".
Then I got a warning message as follows:
Aug 10, 2012 11:12:26 AM org.apache.oodt.cas.crawl.ProductCrawler handleFile
WARNING: Failed to pass preconditions for ingest of product:
[/home/yhkang/oodt-0.5/cas-pushpull/staging/TESL2CO2/TES-Aura_L2-CO2-Nadir_r0000002147_F06_09.he5]
Aug 10, 2012 11:12:26 AM org.apache.oodt.cas.crawl.ProductCrawler handleFile
INFO: Handling file
/home/yhkang/oodt-0.5/cas-pushpull/staging/TESL2CO2/TES-Aura_L2-CO2-Nadir_r0000002147_F06_09.he5.info.tmp
Aug 10, 2012 11:12:26 AM org.apache.oodt.cas.crawl.ProductCrawler handleFile
WARNING: Failed to pass preconditions for ingest of product:
[/home/yhkang/oodt-0.5/cas-pushpull/staging/TESL2CO2/TES-Aura_L2-CO2-Nadir_r0000002147_F06_09.he5.info.tmp]

I think that the warning message is related with preconditions for ingest.
According to the run script for crawler_launcher,  it was wrong to
describe the option "pids" for the preconditions.
#!/bin/sh
export STAGE_AREA=/home/yhkang/oodt-0.5/cas-pushpull/staging/TESL2CO2
./crawler_launcher \
      -op   -stdPC \
      -mfx tmp\
      --productPath $STAGE_AREA\
      --filemgrUrl http://localhost:8000\
       --failureDir /tmp \
       --actionIds DeleteDataFile MoveDataFileToFailureDir Unique \
       --metFileExtension tmp \
       -pids CheckThatDataFileSizeIsGreaterThanZero \
       --clientTransferer
org.apache.oodt.cas.filemgr.datatransfer.LocalDataTransferFactory
Let me know how to fix the warning.

Next I appied an option for metadata crawler  to the run script.
#!/bin/sh
export STAGE_AREA=/home/yhkang/oodt-0.5/cas-pushpull/staging/TESL2CO2
./crawler_launcher \
       -op    -metPC\
       -pp $STAGE_AREA\
       -fm http://localhost:8000\
       -mxc ../policy/crawler-config.xml\
       -mx org.apache.oodt.cas.metadata.extractors.ExternMetExtractor\
       -mxr ../policy/mime-extractor-map.xml\
       --failureDir /tmp \
       --actionIds DeleteDataFile MoveDataFileToFailureDir Unique \
       --metFileExtension tmp \
       --clientTransferer
org.apache.oodt.cas.filemgr.datatransfer.LocalDataTransferFactory

I also get the error message as follows:

ERROR: Failed to launch crawler : Error creating bean with name
'MetExtractorProductCrawler' defined in file
[/home/yhkang/oodt-0.5/cas-crawler-0.5-SNAPSHOT/bin/../policy/crawler-beans.xml]:
Error setting property values; nested exception is
org.springframework.beans.PropertyBatchUpdateException; nested
PropertyAccessExceptions (1) are:
PropertyAccessException 1:
org.springframework.beans.MethodInvocationException: Property
'metExtractor' threw exception; nested exception is
org.apache.oodt.cas.metadata.exceptions.MetExtractionException: Failed
to parse config file : Failed to parser
'/home/yhkang/oodt-0.5/cas-crawler-0.5-SNAPSHOT/policy/crawler-config.xml'
: null

I just used the property file crawler-config.xml (as follows) in the
policy directory.

<beans xmlns="http://www.springframework.org/schema/beans"
        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:p="http://www.springframework.org/schema/p"
        xsi:schemaLocation="http://www.springframework.org/schema/beans
http://www.springframework.org/schema/beans/spring-beans-2.5.xsd">
        <bean class="org.apache.oodt.cas.crawl.util.CasPropertyOverrideConfigurer"
/>
        <import resource="crawler-beans.xml" />
        <import resource="action-beans.xml" />
        <import resource="precondition-beans.xml" />
        <import resource="naming-beans.xml" />
</beans>

So I need to understand how to write some xml files(including
crawler-beans.xml, action-beans.xml, etc), which are imported into the
file  crawler-config.xml .
Could you share your experience with me ?
Thanks,
Yunhee

2012/8/10 Sheryl John <sheryljj@gmail.com>:
> Hi Yunhee,
>
> What are the error messages you get while running the crawler?
>
> I've faced similar issues with crawler when I tried out the first time too.
> I went through the crawler user guide to understand the architecture and
> then understood how it worked only after running crawler with several times
> to ingest files.
> I agree we need to update the guide and if you want to know about the
> MetExtractorProductCrawler and AutoDetectProductCrawler, the wiki page that
> I mentioned before will give you an idea how to get it working (It mentions
> the config files that you need to write for the above two crawlers).
>
>
>
> On Thu, Aug 9, 2012 at 6:27 AM, YunHee Kang <yunh.kang@gmail.com> wrote:
>
>> Hi Chris,
>>
>> I got a bunch of error messages when running the crawler_launcher script.
>> First off, I think I need to understand  how to a crawler works.
>> Can I get some materials to help me write configuration files for
>> crawler_launcher ?
>>
>> Honestly I am not familiar with Crawler.
>> But I will try to file a JIRA issue to update the Crawler user guide.
>>
>> Thanks,
>> Yunhee
>>
>>
>>
>> 2012/8/9 Mattmann, Chris A (388J) <chris.a.mattmann@jpl.nasa.gov>:
>> > Hi YunHee,
>> >
>> > Sorry, we need to update the docs, that is for sure. Can you help
>> > us remember by filing a JIRA issue to update the Crawler user
>> > guide and to fix the URL there?
>> >
>> > As for crawlerId, yes it's obsolete, you can find the modern
>> > 0.4 and 0.5-trunk options by running ./crawler_launcher -h
>> >
>> > Cheers,
>> > Chris
>> >
>> > On Aug 7, 2012, at 7:03 AM, YunHee Kang wrote:
>> >
>> >> Hi Chris and Sheryl,
>> >>
>> >> I understood  my mistake after modifying a wrong URL with the "/".
>> >> But there is the wrong  URL  that is used  as an option of
>> >> crawler_launcher in the apache oodt
>> >> homepage(http://oodt.apache.org/components/maven/crawler/user/).
>> >> --filemgrUrl http://localhost:9000/ \
>> >> So it made me confused.
>> >>
>> >> I tried to run the command mentioned below  according to  the home
>> >> page of apache oodt.
>> >> $ ./crawler_launcher --crawlerId MetExtractorProductCrawler
>> >> ERROR: Invalid option: 'crawlerId'
>> >>
>> >> But the error described above  was occurred.
>> >> Is the option 'crawlerid'  obsolete ?
>> >>
>> >> Thanks,
>> >> Yunhee
>> >>
>> >>
>> >> 2012/8/7 Mattmann, Chris A (388J) <chris.a.mattmann@jpl.nasa.gov>:
>> >>> Perfect, Sheryl, my thoughts exactly.
>> >>>
>> >>> Cheers,
>> >>> Chris
>> >>>
>> >>> On Aug 6, 2012, at 10:01 AM, Sheryl John wrote:
>> >>>
>> >>>> Hi Yunhee,
>> >>>>
>> >>>> Check out this OODT wiki for crawler :
>> >>>> https://cwiki.apache.org/confluence/display/OODT/OODT+Crawler+Help
>> >>>>
>> >>>> Did you try giving 'http://localhost:8000' without the "/" in the
>> end?
>> >>>> Also, specify
>> 'org.apache.oodt.cas.filemgr.datatransfer.LocalDataTransferFactory'
>> >>>> for  'clientTransferer' option.
>> >>>>
>> >>>>
>> >>>> On Mon, Aug 6, 2012 at 9:46 AM, YunHee Kang <yunh.kang@gmail.com>
>> wrote:
>> >>>>
>> >>>>> Hi Chris,
>> >>>>>
>> >>>>> I got an error message when I tried to run crawler_launcher
by using
>> a
>> >>>>> shell script. The error message may be caused by a  wrong URL
of
>> >>>>> filemgr.
>> >>>>> $ ./crawler_launcher.sh
>> >>>>> ERROR: Validation Failures: - Value 'http://localhost:8000/'
is not
>> >>>>> allowed for option
>> >>>>> [longOption='filemgrUrl',shortOption='fm',description='File
Manager
>> >>>>> URL'] - Allowed values = [http://.*:\d*]
>> >>>>>
>> >>>>> The following is the shell script that I wrote:
>> >>>>> $ cat crawler_launcher.sh
>> >>>>> #!/bin/sh
>> >>>>> export STAGE_AREA=/home/yhkang/oodt-0.5/cas-pushpull/staging/TESL2CO2
>> >>>>> ./crawler_launcher \
>> >>>>>      -op --launchStdCrawler \
>> >>>>>      --productPath $STAGE_AREA\
>> >>>>>      --filemgrUrl http://localhost:8000/\
>> >>>>>      --failureDir /tmp \
>> >>>>>      --actionIds DeleteDataFile MoveDataFileToFailureDir Unique
\
>> >>>>>      --metFileExtension tmp \
>> >>>>>      --clientTransferer
>> >>>>> org.apache.oodt.cas.filemgr.datatransfer.LocalDataTransferer
>> >>>>>
>> >>>>> I am wondering if there is a problem in the URL of the filemgr
or
>> elsewhere
>> >>>>>
>> >>>>> Thanks,
>> >>>>> Yunhee
>> >>>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>> --
>> >>>> -Sheryl
>> >>>
>> >>>
>> >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> >>> Chris Mattmann, Ph.D.
>> >>> Senior Computer Scientist
>> >>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> >>> Office: 171-266B, Mailstop: 171-246
>> >>> Email: chris.a.mattmann@nasa.gov
>> >>> WWW:   http://sunset.usc.edu/~mattmann/
>> >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> >>> Adjunct Assistant Professor, Computer Science Department
>> >>> University of Southern California, Los Angeles, CA 90089 USA
>> >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> >>>
>> >
>> >
>> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> > Chris Mattmann, Ph.D.
>> > Senior Computer Scientist
>> > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> > Office: 171-266B, Mailstop: 171-246
>> > Email: chris.a.mattmann@nasa.gov
>> > WWW:   http://sunset.usc.edu/~mattmann/
>> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> > Adjunct Assistant Professor, Computer Science Department
>> > University of Southern California, Los Angeles, CA 90089 USA
>> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> >
>>
>
>
>
> --
> -Sheryl

Mime
View raw message