Return-Path: X-Original-To: apmail-oodt-dev-archive@www.apache.org Delivered-To: apmail-oodt-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 6C565D667 for ; Fri, 10 Aug 2012 03:20:19 +0000 (UTC) Received: (qmail 43599 invoked by uid 500); 10 Aug 2012 03:20:19 -0000 Delivered-To: apmail-oodt-dev-archive@oodt.apache.org Received: (qmail 43504 invoked by uid 500); 10 Aug 2012 03:20:18 -0000 Mailing-List: contact dev-help@oodt.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@oodt.apache.org Delivered-To: mailing list dev@oodt.apache.org Received: (qmail 43463 invoked by uid 99); 10 Aug 2012 03:20:16 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 10 Aug 2012 03:20:16 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of yunh.kang@gmail.com designates 209.85.214.171 as permitted sender) Received: from [209.85.214.171] (HELO mail-ob0-f171.google.com) (209.85.214.171) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 10 Aug 2012 03:20:11 +0000 Received: by obqv19 with SMTP id v19so3956878obq.16 for ; Thu, 09 Aug 2012 20:19:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=NRMJGX4fi2VYMAvNfmWzzXotyqIEry5y1JZwlXOb1PY=; b=0M/6Zy2+Nizgz9pYLuKs9RNlCs9WnFViELa5HXt5J5r8IUOE3dZUVejL16v9pOqL/H M35lC9kQgJD8/y75x6532iCTEEnBK3uP0hEr9768cZayHYStYmFlLlslMCJl9f1/hr+Y zMaM0K0xFjUBPNUD2Je6Z70UiSNdPNNe1i/vt/LuPpPbtPmLHMoFkR0XgWme1iZsDTAg F2/+NJ2gbyBzSRpam21RXaj7CWLizk6T6walhJH/PPJzSsEkIEQ41rin83E10CxfaJYg uxoHeiLGB44K6C6/OxDy52yTOM4HkEWzhKRyAD364QXiRrMZk+wIFSKTYUIQTe3Aakmf 6vjA== MIME-Version: 1.0 Received: by 10.182.75.33 with SMTP id z1mr2350409obv.9.1344568791016; Thu, 09 Aug 2012 20:19:51 -0700 (PDT) Received: by 10.182.118.99 with HTTP; Thu, 9 Aug 2012 20:19:50 -0700 (PDT) In-Reply-To: References: <31B8323A-87B4-42AE-B457-86518D6D15DA@jpl.nasa.gov> <85AF1856-C39B-4C4D-A190-4E4DCE847F7E@jpl.nasa.gov> Date: Fri, 10 Aug 2012 12:19:50 +0900 Message-ID: Subject: Re: Problem happened when I tried to run the script "crawler_launcher" From: YunHee Kang To: dev@oodt.apache.org Content-Type: text/plain; charset=ISO-8859-1 X-Virus-Checked: Checked by ClamAV on apache.org Hi Sheryl, First off, I tried to run crawler_launcher with an option "-autoPC". Then I got a warning message as follows: Aug 10, 2012 11:12:26 AM org.apache.oodt.cas.crawl.ProductCrawler handleFile WARNING: Failed to pass preconditions for ingest of product: [/home/yhkang/oodt-0.5/cas-pushpull/staging/TESL2CO2/TES-Aura_L2-CO2-Nadir_r0000002147_F06_09.he5] Aug 10, 2012 11:12:26 AM org.apache.oodt.cas.crawl.ProductCrawler handleFile INFO: Handling file /home/yhkang/oodt-0.5/cas-pushpull/staging/TESL2CO2/TES-Aura_L2-CO2-Nadir_r0000002147_F06_09.he5.info.tmp Aug 10, 2012 11:12:26 AM org.apache.oodt.cas.crawl.ProductCrawler handleFile WARNING: Failed to pass preconditions for ingest of product: [/home/yhkang/oodt-0.5/cas-pushpull/staging/TESL2CO2/TES-Aura_L2-CO2-Nadir_r0000002147_F06_09.he5.info.tmp] I think that the warning message is related with preconditions for ingest. According to the run script for crawler_launcher, it was wrong to describe the option "pids" for the preconditions. #!/bin/sh export STAGE_AREA=/home/yhkang/oodt-0.5/cas-pushpull/staging/TESL2CO2 ./crawler_launcher \ -op -stdPC \ -mfx tmp\ --productPath $STAGE_AREA\ --filemgrUrl http://localhost:8000\ --failureDir /tmp \ --actionIds DeleteDataFile MoveDataFileToFailureDir Unique \ --metFileExtension tmp \ -pids CheckThatDataFileSizeIsGreaterThanZero \ --clientTransferer org.apache.oodt.cas.filemgr.datatransfer.LocalDataTransferFactory Let me know how to fix the warning. Next I appied an option for metadata crawler to the run script. #!/bin/sh export STAGE_AREA=/home/yhkang/oodt-0.5/cas-pushpull/staging/TESL2CO2 ./crawler_launcher \ -op -metPC\ -pp $STAGE_AREA\ -fm http://localhost:8000\ -mxc ../policy/crawler-config.xml\ -mx org.apache.oodt.cas.metadata.extractors.ExternMetExtractor\ -mxr ../policy/mime-extractor-map.xml\ --failureDir /tmp \ --actionIds DeleteDataFile MoveDataFileToFailureDir Unique \ --metFileExtension tmp \ --clientTransferer org.apache.oodt.cas.filemgr.datatransfer.LocalDataTransferFactory I also get the error message as follows: ERROR: Failed to launch crawler : Error creating bean with name 'MetExtractorProductCrawler' defined in file [/home/yhkang/oodt-0.5/cas-crawler-0.5-SNAPSHOT/bin/../policy/crawler-beans.xml]: Error setting property values; nested exception is org.springframework.beans.PropertyBatchUpdateException; nested PropertyAccessExceptions (1) are: PropertyAccessException 1: org.springframework.beans.MethodInvocationException: Property 'metExtractor' threw exception; nested exception is org.apache.oodt.cas.metadata.exceptions.MetExtractionException: Failed to parse config file : Failed to parser '/home/yhkang/oodt-0.5/cas-crawler-0.5-SNAPSHOT/policy/crawler-config.xml' : null I just used the property file crawler-config.xml (as follows) in the policy directory. So I need to understand how to write some xml files(including crawler-beans.xml, action-beans.xml, etc), which are imported into the file crawler-config.xml . Could you share your experience with me ? Thanks, Yunhee 2012/8/10 Sheryl John : > Hi Yunhee, > > What are the error messages you get while running the crawler? > > I've faced similar issues with crawler when I tried out the first time too. > I went through the crawler user guide to understand the architecture and > then understood how it worked only after running crawler with several times > to ingest files. > I agree we need to update the guide and if you want to know about the > MetExtractorProductCrawler and AutoDetectProductCrawler, the wiki page that > I mentioned before will give you an idea how to get it working (It mentions > the config files that you need to write for the above two crawlers). > > > > On Thu, Aug 9, 2012 at 6:27 AM, YunHee Kang wrote: > >> Hi Chris, >> >> I got a bunch of error messages when running the crawler_launcher script. >> First off, I think I need to understand how to a crawler works. >> Can I get some materials to help me write configuration files for >> crawler_launcher ? >> >> Honestly I am not familiar with Crawler. >> But I will try to file a JIRA issue to update the Crawler user guide. >> >> Thanks, >> Yunhee >> >> >> >> 2012/8/9 Mattmann, Chris A (388J) : >> > Hi YunHee, >> > >> > Sorry, we need to update the docs, that is for sure. Can you help >> > us remember by filing a JIRA issue to update the Crawler user >> > guide and to fix the URL there? >> > >> > As for crawlerId, yes it's obsolete, you can find the modern >> > 0.4 and 0.5-trunk options by running ./crawler_launcher -h >> > >> > Cheers, >> > Chris >> > >> > On Aug 7, 2012, at 7:03 AM, YunHee Kang wrote: >> > >> >> Hi Chris and Sheryl, >> >> >> >> I understood my mistake after modifying a wrong URL with the "/". >> >> But there is the wrong URL that is used as an option of >> >> crawler_launcher in the apache oodt >> >> homepage(http://oodt.apache.org/components/maven/crawler/user/). >> >> --filemgrUrl http://localhost:9000/ \ >> >> So it made me confused. >> >> >> >> I tried to run the command mentioned below according to the home >> >> page of apache oodt. >> >> $ ./crawler_launcher --crawlerId MetExtractorProductCrawler >> >> ERROR: Invalid option: 'crawlerId' >> >> >> >> But the error described above was occurred. >> >> Is the option 'crawlerid' obsolete ? >> >> >> >> Thanks, >> >> Yunhee >> >> >> >> >> >> 2012/8/7 Mattmann, Chris A (388J) : >> >>> Perfect, Sheryl, my thoughts exactly. >> >>> >> >>> Cheers, >> >>> Chris >> >>> >> >>> On Aug 6, 2012, at 10:01 AM, Sheryl John wrote: >> >>> >> >>>> Hi Yunhee, >> >>>> >> >>>> Check out this OODT wiki for crawler : >> >>>> https://cwiki.apache.org/confluence/display/OODT/OODT+Crawler+Help >> >>>> >> >>>> Did you try giving 'http://localhost:8000' without the "/" in the >> end? >> >>>> Also, specify >> 'org.apache.oodt.cas.filemgr.datatransfer.LocalDataTransferFactory' >> >>>> for 'clientTransferer' option. >> >>>> >> >>>> >> >>>> On Mon, Aug 6, 2012 at 9:46 AM, YunHee Kang >> wrote: >> >>>> >> >>>>> Hi Chris, >> >>>>> >> >>>>> I got an error message when I tried to run crawler_launcher by using >> a >> >>>>> shell script. The error message may be caused by a wrong URL of >> >>>>> filemgr. >> >>>>> $ ./crawler_launcher.sh >> >>>>> ERROR: Validation Failures: - Value 'http://localhost:8000/' is not >> >>>>> allowed for option >> >>>>> [longOption='filemgrUrl',shortOption='fm',description='File Manager >> >>>>> URL'] - Allowed values = [http://.*:\d*] >> >>>>> >> >>>>> The following is the shell script that I wrote: >> >>>>> $ cat crawler_launcher.sh >> >>>>> #!/bin/sh >> >>>>> export STAGE_AREA=/home/yhkang/oodt-0.5/cas-pushpull/staging/TESL2CO2 >> >>>>> ./crawler_launcher \ >> >>>>> -op --launchStdCrawler \ >> >>>>> --productPath $STAGE_AREA\ >> >>>>> --filemgrUrl http://localhost:8000/\ >> >>>>> --failureDir /tmp \ >> >>>>> --actionIds DeleteDataFile MoveDataFileToFailureDir Unique \ >> >>>>> --metFileExtension tmp \ >> >>>>> --clientTransferer >> >>>>> org.apache.oodt.cas.filemgr.datatransfer.LocalDataTransferer >> >>>>> >> >>>>> I am wondering if there is a problem in the URL of the filemgr or >> elsewhere >> >>>>> >> >>>>> Thanks, >> >>>>> Yunhee >> >>>>> >> >>>> >> >>>> >> >>>> >> >>>> -- >> >>>> -Sheryl >> >>> >> >>> >> >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> >>> Chris Mattmann, Ph.D. >> >>> Senior Computer Scientist >> >>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA >> >>> Office: 171-266B, Mailstop: 171-246 >> >>> Email: chris.a.mattmann@nasa.gov >> >>> WWW: http://sunset.usc.edu/~mattmann/ >> >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> >>> Adjunct Assistant Professor, Computer Science Department >> >>> University of Southern California, Los Angeles, CA 90089 USA >> >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> >>> >> > >> > >> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> > Chris Mattmann, Ph.D. >> > Senior Computer Scientist >> > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA >> > Office: 171-266B, Mailstop: 171-246 >> > Email: chris.a.mattmann@nasa.gov >> > WWW: http://sunset.usc.edu/~mattmann/ >> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> > Adjunct Assistant Professor, Computer Science Department >> > University of Southern California, Los Angeles, CA 90089 USA >> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> > >> > > > > -- > -Sheryl