oodt-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mallder, Valerie" <Valerie.Mall...@jhuapl.edu>
Subject RE: Transition from OODT 0.6 to 0.12 cannot find extractor specifications
Date Thu, 07 Apr 2016 02:15:03 GMT
I haven't had a chance to study this yet. But after a first pass through this email trail I'm
suspicious that Kostas may be running into the same problem I ran into when tika was either
introduced or upgraded to a much newer version than had been in the system previously. I ended
up having to modify my mimetypes.xml file to get around the problem I was having after that
happened. But, I will look at this in detail tomorrow and compare it to my history of debugging
when I was going from versions 0.6 to 0.7 to 0.8 to 0.9 and 0.10 and see if the problem is
what I have seen before. However, I am staying at 0.10, so I won't be able to speak for going
up to version 0.12.

Val



Sent with Good (www.good.com)
________________________________
From: Chris Mattmann <chris.mattmann@gmail.com>
Sent: Wednesday, April 6, 2016 9:58:15 PM
To: dev@oodt.apache.org
Subject: Re: Transition from OODT 0.6 to 0.12 cannot find extractor specifications

Thanks Kostas, they are wire compatible and this is a good
use case.

The crawler should not have undergone much update (perhaps at
all) since 0.6, so am not exactly sure why you were seeing
issues with it. There are definitely upgrades since 0.6 to CAS-PGE
and maybe that’s what you were running into.


—
Chris Mattmann
chris.mattmann@gmail.com







On 4/6/16, 6:47 PM, "Konstantinos Mavrommatis" <kmavrommatis@celgene.com> wrote:

>I am giving up on this....
>I had used [1] in the first place to setup oodt (v0.6 back then) my setup in the new system
is identical to the old one.
>I could not make much out of [0]. Among other things I tried to copy the files in the
old crawler/policy to the new crawler/policy - which included some legacy-cmd-line-options.xml,
legacy-cmd-line actions.xml. I also tried to reinstall the full oodt on the client side, but
still did not work.
>
>I ended up reverting to the older version (0.6) which I run on my client. The server (which
runs FM) is still 0.12, but the combination seems to be working fine.
>
>K
>
>-----Original Message-----
>From: Lewis John Mcgibbney [mailto:lewis.mcgibbney@gmail.com]
>Sent: Tuesday, April 05, 2016 3:33 AM
>To: dev@oodt.apache.org
>Subject: Re: Transition from OODT 0.6 to 0.12 cannot find extractor specifications
>
>Hi K,
>OK so I did a bit of searching here and located a bunch of files which are defined as
legacy... you can check the search results out below https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_oodt_search-3Futf8-3D-25E2-259C-2593-26q-3DAutoDetectProductCrawler-26type-3DCode&d=CwIBaQ&c=CZZujK3G2KuXGEKnzq-Hrg&r=wndYZ4MLMT9l3Zb2WZv2hq2O6yvZ1Cs-T2gHY95y7ZA&m=AZOhzDmmNuBD_R9H2fm-CubVmid0OEJbXqk4G2cmzDs&s=B33E_m-BUEEQBqIqa2J8tZ2vnLqfwapWZp9Rn5nRyU8&e=
>I would urge you to have a look at the AutoDetectProductCrawler Javadoc description included
in master branch [0] as well to see if you've got everything required.
>Finally, I came across some documentation on the wiki which may guide you in the right
direction [1]. It may also be outdated though so please let us know if that it the case.
>hth
>
>[0]
>https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_oodt_blob_91d0bafe71124906bd94baad746189caf35fb39c_crawler_src_main_java_org_apache_oodt_cas_crawl_AutoDetectProductCrawler.java-23L40-2DL64&d=CwIBaQ&c=CZZujK3G2KuXGEKnzq-Hrg&r=wndYZ4MLMT9l3Zb2WZv2hq2O6yvZ1Cs-T2gHY95y7ZA&m=AZOhzDmmNuBD_R9H2fm-CubVmid0OEJbXqk4G2cmzDs&s=rJpNgTfZDhDyGV5KksACkvbSnkVvobGfBQcxXiLWwT4&e=
>[1]
>https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_OODT_Mime-2Btype-2Bdetection-2Bwith-2Bthe-2BAutoDetectProductCrawler&d=CwIBaQ&c=CZZujK3G2KuXGEKnzq-Hrg&r=wndYZ4MLMT9l3Zb2WZv2hq2O6yvZ1Cs-T2gHY95y7ZA&m=AZOhzDmmNuBD_R9H2fm-CubVmid0OEJbXqk4G2cmzDs&s=V5fEGERshX3JHBTQXryhwoEZqhgarILk8WutEwICmGs&e=
>
>On Mon, Apr 4, 2016 at 10:54 PM, Konstantinos Mavrommatis < kmavrommatis@celgene.com>
wrote:
>
>> Hi,
>> It seems to be happening for a number of types of files that I have in
>> the mimetypes.xml.
>> A few things are puzzling to me: this file which is a .gz file is not
>> processed by the regular tika mimetypes which contains the gzip files
>> A file that has no extension, which defaults to txt is passed to the
>> MetExtractor.pl and processed.
>>
>> Any ideas I can find what are the preconditions that fail ? I tried to
>> change the log level to DEBUG for all components but I did not get
>> much more information. This must be something that changed in the OODT
>> releases
>> >0.6 but could not find anything relevant in the release notes.
>> I also noticed in the documentation  of the AutoDecectProductCrawler
>> that it uses the file met-extr-preconditions.xml which I could not
>> find anywhere in the deployed OODT or the src directories. Could that
>> be a reason for the problem I observe?
>>
>> Thanks
>> K
>>
>> -----Original Message-----
>> From: Lewis John Mcgibbney [mailto:lewis.mcgibbney@gmail.com]
>> Sent: Monday, April 04, 2016 3:24 PM
>> To: dev@oodt.apache.org
>> Subject: Re: Transition from OODT 0.6 to 0.12 cannot find extractor
>> specifications
>>
>> Hi Konstantinos,
>> It appears to be happening with a tar.gz file as well right?
>>
>> WARNING: No extractor specs specified for
>> /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fast
>> q/cas-crawler-04-02-16.log.gz
>>
>> I wonder if it is the file names... However I would be extremely
>> surprised as I've seen some much more verbose file naming.
>> Lewis
>>
>> On Saturday, April 2, 2016, Konstantinos Mavrommatis <
>> kmavrommatis@celgene.com> wrote:
>>
>> > Hi,
>> > I am trying to replicate a fully functional service that I had setup
>> > long time ago using OODT 0.6 but I am having the following problem
>> > that does not allow me to ingest files. When I try to ingest files
>> > with the extension fastq.gz I get the line:
>> > WARNING: No extractor specs specified for
>> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
>> > st q/E837642_R1.fastq.gz Apr 02, 2016 10:12:14 PM
>> > org.apache.oodt.cas.crawl.ProductCrawler
>> > handleFile
>> > And of course the file is not ingested. This process works without
>> > problem with OODT 0.6 on a different server.
>> >
>> > The crawler command I am running is:
>> > ./crawler_launcher \
>> > --operation \
>> > --launchAutoCrawler \
>> > --productPath $FILEPATH \
>> > --filemgrUrl $OODT_FILEMGR_URL \
>> > --clientTransferer
>> > org.apache.oodt.cas.filemgr.datatransfer.InPlaceDataTransferFactory
>> > \ --mimeExtractorRepo ../policy/mime-extractor-map.xml \ --noRecur \
>> > --crawlForDirs 2>&1
>> >
>> >
>> >
>> > I have setup OODT 0.12 on a server which runs FM listening to port 9000.
>> > From a client machine I have verified that I can use FM to ingest
>> products.
>> > I am now trying to use crawler to crawl and ingest all files in a
>> > directory. Since I have non standard MIME types in these directories
>> > I have done the following:
>> > 1. Added my own mime types in policy/mimetypes.xml eg
>> >   <mime-type type="text/fastq">
>> >                 <glob pattern="*.fastq"/>
>> >                 <glob pattern="*.fastq.gz"/>
>> >                 <glob pattern="*.fastq.bz"/>
>> >                 <glob pattern="*.fastq.bz2"/>
>> >                 <glob pattern="*.fastq.bzip"/>
>> >                 <glob pattern="*.fq"/>
>> >                 <glob pattern="*.fq.gz"/>
>> >                 <glob pattern="*.fq.bz"/>
>> >                 <glob pattern="*.fq.bz2"/>
>> >                 <glob pattern="*.fq.bzip"/>
>> >         </mime-type>
>> > 2. created the file policy/mime-extractor-map.xml
>> >
>> >         <mime type="text/fastq">
>> >                 <extractor
>> > class="org.apache.oodt.cas.metadata.extractors.ExternMetExtractor">
>> >                         <config
>> > file="/apache-oodt/crawler/bin/fastq.config"/>
>> >                         <preCondComparators>
>> >                                 <preCondComparator
>> > id="CheckThatDataFileSizeIsGreaterThanZero"/>
>> >                         </preCondComparators>
>> >                 </extractor>
>> >         </mime>
>> >
>> > 3. created the file fastq.config
>> > <?xml version="1.0" encoding="UTF-8"?> <cas:externextractor
>> > xmlns:cas="https://urldefense.proofpoint.com/v2/url?u=http-3A__oodt.jpl.nasa.gov_1.0_cas&d=CwIBaQ&c=CZZujK3G2KuXGEKnzq-Hrg&r=wndYZ4MLMT9l3Zb2WZv2hq2O6yvZ1Cs-T2gHY95y7ZA&m=AZOhzDmmNuBD_R9H2fm-CubVmid0OEJbXqk4G2cmzDs&s=FvkBYgoM8RnUm2ITaMjYb1s1sa9YtHvNL4c1M_KF06w&e=
">
>> >   <exec workingDir="">
>> >
>> >
>> <extractorBinPath>/apache-oodt/crawler/bin/MetExtractorNGS.pl</extract
>> orBinPath>
>> >       <args>
>> >          <arg isDataFile="true"></arg>
>> >         <arg>fastq</arg>
>> >       </args>
>> >    </exec>
>> > </cas:externextractor>
>> >
>> >
>> >
>> > The MetExtractorNGS.pl is a small perl script that opens the file to
>> > be ingested, gets some information and stores it in the .met file
>> > that corresponds to the file to be ingested and have manually
>> > verified that works as expected producing the correct met file.
>> >
>> > What am I missing here? Any ideas comments suggestions will be
>> > greatly appreciated.
>> > Thanks in advance for any help
>> > Kostas
>> >
>> >
>> >
>> > PS1 The full output from running the crawler command follows:
>> >
>> >
>> > Setting property 'StdProductCrawler.filemgrUrl'
>> > Setting property 'MetExtractorProductCrawler.filemgrUrl'
>> > Setting property 'AutoDetectProductCrawler.filemgrUrl'
>> > Setting property 'StdProductCrawler.clientTransferer'
>> > Setting property 'MetExtractorProductCrawler.clientTransferer'
>> > Setting property 'AutoDetectProductCrawler.clientTransferer'
>> > Setting property 'StdProductCrawler.noRecur'
>> > Setting property 'MetExtractorProductCrawler.noRecur'
>> > Setting property 'AutoDetectProductCrawler.noRecur'
>> > Setting property 'AutoDetectProductCrawler.mimeExtractorRepo'
>> > Setting property 'StdProductCrawler.productPath'
>> > Setting property 'MetExtractorProductCrawler.productPath'
>> > Setting property 'AutoDetectProductCrawler.productPath'
>> > Apr 02, 2016 10:12:13 PM
>> > org.springframework.beans.factory.config.PropertyOverrideConfigurer
>> > processKey
>> > FINE: Property 'AutoDetectProductCrawler.noRecur' set to value
>> > [true] Apr 02, 2016 10:12:13 PM
>> > org.springframework.beans.factory.config.PropertyOverrideConfigurer
>> > processKey
>> > FINE: Property 'StdProductCrawler.productPath' set to value
>> > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f
>> > as
>> > tq]
>> > Apr 02, 2016 10:12:13 PM
>> > org.springframework.beans.factory.config.PropertyOverrideConfigurer
>> > processKey
>> > FINE: Property 'MetExtractorProductCrawler.noRecur' set to value
>> > [true] Apr 02, 2016 10:12:13 PM
>> > org.springframework.beans.factory.config.PropertyOverrideConfigurer
>> > processKey
>> > FINE: Property 'AutoDetectProductCrawler.mimeExtractorRepo' set to
>> > value [../policy/mime-extractor-map.xml]
>> > Apr 02, 2016 10:12:13 PM
>> > org.springframework.beans.factory.config.PropertyOverrideConfigurer
>> > processKey
>> > FINE: Property 'MetExtractorProductCrawler.clientTransferer' set to
>> > value
>> > [org.apache.oodt.cas.filemgr.datatransfer.InPlaceDataTransferFactory
>> > ]
>> > Apr 02, 2016 10:12:13 PM
>> > org.springframework.beans.factory.config.PropertyOverrideConfigurer
>> > processKey
>> > FINE: Property 'AutoDetectProductCrawler.filemgrUrl' set to value [
>> > https://urldefense.proofpoint.com/v2/url?u=http-3A__192.168.8.44-3A9
>> > 00
>> > 0&d=CwIBaQ&c=CZZujK3G2KuXGEKnzq-Hrg&r=wndYZ4MLMT9l3Zb2WZv2hq2O6yvZ1C
>> > s-
>> > T2gHY95y7ZA&m=Qaz0eKz7FHe35NMF43A17ey59ANhAqJD5ZfwZQC0VRo&s=OvpwZVR1
>> > Xq gKclL83VXAWh__c7nz87xK_nS-O7hIXqc&e= ] Apr 02, 2016 10:12:13 PM
>> > org.springframework.beans.factory.config.PropertyOverrideConfigurer
>> > processKey
>> > FINE: Property 'AutoDetectProductCrawler.clientTransferer' set to
>> > value
>> > [org.apache.oodt.cas.filemgr.datatransfer.InPlaceDataTransferFactory
>> > ]
>> > Apr 02, 2016 10:12:13 PM
>> > org.springframework.beans.factory.config.PropertyOverrideConfigurer
>> > processKey
>> > FINE: Property 'StdProductCrawler.noRecur' set to value [true] Apr
>> > 02,
>> > 2016 10:12:13 PM
>> > org.springframework.beans.factory.config.PropertyOverrideConfigurer
>> > processKey
>> > FINE: Property 'StdProductCrawler.filemgrUrl' set to value [
>> > https://urldefense.proofpoint.com/v2/url?u=http-3A__192.168.8.44-3A9
>> > 00
>> > 0&d=CwIBaQ&c=CZZujK3G2KuXGEKnzq-Hrg&r=wndYZ4MLMT9l3Zb2WZv2hq2O6yvZ1C
>> > s-
>> > T2gHY95y7ZA&m=Qaz0eKz7FHe35NMF43A17ey59ANhAqJD5ZfwZQC0VRo&s=OvpwZVR1
>> > Xq gKclL83VXAWh__c7nz87xK_nS-O7hIXqc&e= ] Apr 02, 2016 10:12:13 PM
>> > org.springframework.beans.factory.config.PropertyOverrideConfigurer
>> > processKey
>> > FINE: Property 'AutoDetectProductCrawler.productPath' set to value
>> > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f
>> > as
>> > tq]
>> > Apr 02, 2016 10:12:13 PM
>> > org.springframework.beans.factory.config.PropertyOverrideConfigurer
>> > processKey
>> > FINE: Property 'MetExtractorProductCrawler.filemgrUrl' set to value
>> > [
>> > https://urldefense.proofpoint.com/v2/url?u=http-3A__192.168.8.44-3A9
>> > 00
>> > 0&d=CwIBaQ&c=CZZujK3G2KuXGEKnzq-Hrg&r=wndYZ4MLMT9l3Zb2WZv2hq2O6yvZ1C
>> > s-
>> > T2gHY95y7ZA&m=Qaz0eKz7FHe35NMF43A17ey59ANhAqJD5ZfwZQC0VRo&s=OvpwZVR1
>> > Xq gKclL83VXAWh__c7nz87xK_nS-O7hIXqc&e= ] Apr 02, 2016 10:12:13 PM
>> > org.springframework.beans.factory.config.PropertyOverrideConfigurer
>> > processKey
>> > FINE: Property 'StdProductCrawler.clientTransferer' set to value
>> > [org.apache.oodt.cas.filemgr.datatransfer.InPlaceDataTransferFactory
>> > ]
>> > Apr 02, 2016 10:12:13 PM
>> > org.springframework.beans.factory.config.PropertyOverrideConfigurer
>> > processKey
>> > FINE: Property 'MetExtractorProductCrawler.productPath' set to value
>> > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f
>> > as tq] Apr 02, 2016 10:12:13 PM
>> > org.apache.oodt.cas.crawl.ProductCrawler
>> > crawl
>> > INFO: Crawling
>> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
>> > st q Apr 02, 2016 10:12:13 PM
>> > org.apache.oodt.cas.crawl.ProductCrawler
>> > handleFile
>> > INFO: Handling file
>> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
>> > st
>> > q/E837642_R1.fastq.gz
>> > Apr 02, 2016 10:12:14 PM
>> > org.apache.oodt.cas.crawl.AutoDetectProductCrawler
>> > passesPreconditions
>> > WARNING: No extractor specs specified for
>> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
>> > st q/E837642_R1.fastq.gz Apr 02, 2016 10:12:14 PM
>> > org.apache.oodt.cas.crawl.ProductCrawler
>> > handleFile
>> > WARNING: Failed to pass preconditions for ingest of product:
>> > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f
>> > as tq/E837642_R1.fastq.gz] Apr 02, 2016 10:12:14 PM
>> > org.apache.oodt.cas.crawl.ProductCrawler
>> > handleFile
>> > INFO: Handling file
>> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
>> > st
>> > q/E837642_R1.fastq.gz.met
>> > Apr 02, 2016 10:12:14 PM
>> > org.apache.oodt.cas.metadata.preconditions.PreCondEvalUtils eval
>> > INFO: Passed precondition comparator id
>> > CheckThatDataFileSizeIsGreaterThanZero
>> > Apr 02, 2016 10:12:14 PM
>> > org.apache.oodt.cas.metadata.extractors.ExternMetExtractor
>> > extrMetadata
>> > INFO: Generating met file for product file:
>> > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f
>> > as
>> > tq/E837642_R1.fastq.gz.met]
>> > Apr 02, 2016 10:12:14 PM
>> > org.apache.oodt.cas.metadata.extractors.ExternMetExtractor
>> > extrMetadata
>> > INFO: Executing command line:
>> > [/celgene/software/apache-oodt/crawler/bin/MetExtractorNGS.pl
>> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
>> > st
>> > q/E837642_R1.fastq.gz.met
>> > text ] with workingDir:
>> > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f
>> > as
>> > tq]
>> > to extract metadata
>> > OUTPUT: [WARN : MetExtractorNGS - 2016/04/02 22:12:15] - Input file
>> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
>> > st q/E837642_R1.fastq.gz.met will be ignored. .met files are not
>> > processed !
>> > Apr 02, 2016 10:12:15 PM org.apache.oodt.cas.crawl.ProductCrawler
>> > handleFile
>> > SEVERE: Failed to get metadata for product : Met extractor failed to
>> > create metadata file
>> > org.apache.oodt.cas.metadata.exceptions.MetExtractionException: Met
>> > extractor failed to create metadata file
>> >         at
>> >
>> org.apache.oodt.cas.metadata.extractors.ExternMetExtractor.extrMetadat
>> a(ExternMetExtractor.java:120)
>> >         at
>> >
>> org.apache.oodt.cas.metadata.AbstractMetExtractor.extractMetadata(Abst
>> ractMetExtractor.java:74)
>> >         at
>> >
>> org.apache.oodt.cas.crawl.AutoDetectProductCrawler.getMetadataForProdu
>> ct(AutoDetectProductCrawler.java:84)
>> >         at
>> >
>> org.apache.oodt.cas.crawl.ProductCrawler.handleFile(ProductCrawler.jav
>> a:136)
>> >         at
>> > org.apache.oodt.cas.crawl.ProductCrawler.crawl(ProductCrawler.java:104)
>> >         at
>> > org.apache.oodt.cas.crawl.ProductCrawler.crawl(ProductCrawler.java:74)
>> >         at
>> >
>> org.apache.oodt.cas.crawl.cli.action.CrawlerLauncherCliAction.execute(
>> CrawlerLauncherCliAction.java:58)
>> >         at
>> > org.apache.oodt.cas.cli.CmdLineUtility.execute(CmdLineUtility.java:331)
>> >         at
>> > org.apache.oodt.cas.cli.CmdLineUtility.run(CmdLineUtility.java:188)
>> >         at
>> > org.apache.oodt.cas.crawl.CrawlerLauncher.main(CrawlerLauncher.java:
>> > 36
>> > )
>> >
>> > Apr 02, 2016 10:12:15 PM org.apache.oodt.cas.crawl.ProductCrawler
>> > handleFile
>> > INFO: Handling file
>> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
>> > st
>> > q/E837642_R2.fastq.gz
>> > Apr 02, 2016 10:12:15 PM
>> > org.apache.oodt.cas.crawl.AutoDetectProductCrawler
>> > passesPreconditions
>> > WARNING: No extractor specs specified for
>> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
>> > st q/E837642_R2.fastq.gz Apr 02, 2016 10:12:15 PM
>> > org.apache.oodt.cas.crawl.ProductCrawler
>> > handleFile
>> > WARNING: Failed to pass preconditions for ingest of product:
>> > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f
>> > as tq/E837642_R2.fastq.gz] Apr 02, 2016 10:12:15 PM
>> > org.apache.oodt.cas.crawl.ProductCrawler
>> > handleFile
>> > INFO: Handling file
>> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
>> > st
>> > q/E837642_R2.fastq.gz.met
>> > Apr 02, 2016 10:12:15 PM
>> > org.apache.oodt.cas.metadata.preconditions.PreCondEvalUtils eval
>> > INFO: Passed precondition comparator id
>> > CheckThatDataFileSizeIsGreaterThanZero
>> > Apr 02, 2016 10:12:16 PM
>> > org.apache.oodt.cas.metadata.extractors.ExternMetExtractor
>> > extrMetadata
>> > INFO: Generating met file for product file:
>> > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f
>> > as
>> > tq/E837642_R2.fastq.gz.met]
>> > Apr 02, 2016 10:12:16 PM
>> > org.apache.oodt.cas.metadata.extractors.ExternMetExtractor
>> > extrMetadata
>> > INFO: Executing command line:
>> > [/celgene/software/apache-oodt/crawler/bin/MetExtractorNGS.pl
>> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
>> > st
>> > q/E837642_R2.fastq.gz.met
>> > text ] with workingDir:
>> > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f
>> > as
>> > tq]
>> > to extract metadata
>> > OUTPUT: [WARN : MetExtractorNGS - 2016/04/02 22:12:16] - Input file
>> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
>> > st q/E837642_R2.fastq.gz.met will be ignored. .met files are not
>> > processed !
>> > Apr 02, 2016 10:12:17 PM org.apache.oodt.cas.crawl.ProductCrawler
>> > handleFile
>> > SEVERE: Failed to get metadata for product : Met extractor failed to
>> > create metadata file
>> > org.apache.oodt.cas.metadata.exceptions.MetExtractionException: Met
>> > extractor failed to create metadata file
>> >         at
>> >
>> org.apache.oodt.cas.metadata.extractors.ExternMetExtractor.extrMetadat
>> a(ExternMetExtractor.java:120)
>> >         at
>> >
>> org.apache.oodt.cas.metadata.AbstractMetExtractor.extractMetadata(Abst
>> ractMetExtractor.java:74)
>> >         at
>> >
>> org.apache.oodt.cas.crawl.AutoDetectProductCrawler.getMetadataForProdu
>> ct(AutoDetectProductCrawler.java:84)
>> >         at
>> >
>> org.apache.oodt.cas.crawl.ProductCrawler.handleFile(ProductCrawler.jav
>> a:136)
>> >         at
>> > org.apache.oodt.cas.crawl.ProductCrawler.crawl(ProductCrawler.java:104)
>> >         at
>> > org.apache.oodt.cas.crawl.ProductCrawler.crawl(ProductCrawler.java:74)
>> >         at
>> >
>> org.apache.oodt.cas.crawl.cli.action.CrawlerLauncherCliAction.execute(
>> CrawlerLauncherCliAction.java:58)
>> >         at
>> > org.apache.oodt.cas.cli.CmdLineUtility.execute(CmdLineUtility.java:331)
>> >         at
>> > org.apache.oodt.cas.cli.CmdLineUtility.run(CmdLineUtility.java:188)
>> >         at
>> > org.apache.oodt.cas.crawl.CrawlerLauncher.main(CrawlerLauncher.java:
>> > 36
>> > )
>> >
>> > Apr 02, 2016 10:12:17 PM org.apache.oodt.cas.crawl.ProductCrawler
>> > handleFile
>> > INFO: Handling file
>> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
>> > st
>> > q/cas-crawler-04-02-16.log.gz
>> > Apr 02, 2016 10:12:17 PM
>> > org.apache.oodt.cas.crawl.AutoDetectProductCrawler
>> > passesPreconditions
>> > WARNING: No extractor specs specified for
>> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
>> > st q/cas-crawler-04-02-16.log.gz Apr 02, 2016 10:12:17 PM
>> > org.apache.oodt.cas.crawl.ProductCrawler
>> > handleFile
>> > WARNING: Failed to pass preconditions for ingest of product:
>> > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f
>> > as tq/cas-crawler-04-02-16.log.gz] Apr 02, 2016 10:12:17 PM
>> > org.apache.oodt.cas.crawl.ProductCrawler
>> > handleFile
>> > INFO: Handling file
>> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
>> > st
>> > q/cas-crawler-04-02-16.tar.gz
>> > Apr 02, 2016 10:12:17 PM
>> > org.apache.oodt.cas.crawl.AutoDetectProductCrawler
>> > passesPreconditions
>> > WARNING: No extractor specs specified for
>> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
>> > st q/cas-crawler-04-02-16.tar.gz Apr 02, 2016 10:12:17 PM
>> > org.apache.oodt.cas.crawl.ProductCrawler
>> > handleFile
>> > WARNING: Failed to pass preconditions for ingest of product:
>> > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f
>> > as tq/cas-crawler-04-02-16.tar.gz] Apr 02, 2016 10:12:17 PM
>> > org.apache.oodt.cas.crawl.ProductCrawler
>> > handleFile
>> > INFO: Handling file
>> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
>> > st
>> > q/cas-crawler-mnt-celgene.rnd.combio.mmgp.external-TestSeqData-RNA-S
>> > eq
>> > -RawData-fastq-04-02-16.tar.gz
>> > Apr 02, 2016 10:12:17 PM
>> > org.apache.oodt.cas.crawl.AutoDetectProductCrawler
>> > passesPreconditions
>> > WARNING: No extractor specs specified for
>> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
>> > st
>> > q/cas-crawler-mnt-celgene.rnd.combio.mmgp.external-TestSeqData-RNA-S
>> > eq -RawData-fastq-04-02-16.tar.gz Apr 02, 2016 10:12:17 PM
>> > org.apache.oodt.cas.crawl.ProductCrawler
>> > handleFile
>> > WARNING: Failed to pass preconditions for ingest of product:
>> > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f
>> > as
>> > tq/cas-crawler-mnt-celgene.rnd.combio.mmgp.external-TestSeqData-RNA-
>> > Se q-RawData-fastq-04-02-16.tar.gz] Apr 02, 2016 10:12:17 PM
>> > org.apache.oodt.cas.crawl.ProductCrawler
>> > handleFile
>> > INFO: Handling file
>> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
>> > st
>> > q/test
>> > Apr 02, 2016 10:12:17 PM
>> > org.apache.oodt.cas.metadata.preconditions.PreCondEvalUtils eval
>> > INFO: Passed precondition comparator id
>> > CheckThatDataFileSizeIsGreaterThanZero
>> > Apr 02, 2016 10:12:17 PM
>> > org.apache.oodt.cas.metadata.extractors.ExternMetExtractor
>> > extrMetadata
>> > INFO: Generating met file for product file:
>> > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f
>> > as
>> > tq/test]
>> > Apr 02, 2016 10:12:17 PM
>> > org.apache.oodt.cas.metadata.extractors.ExternMetExtractor
>> > extrMetadata
>> > INFO: Executing command line:
>> > [/celgene/software/apache-oodt/crawler/bin/MetExtractorNGS.pl
>> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
>> > st
>> > q/test
>> > text ] with workingDir:
>> > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f
>> > as
>> > tq]
>> > to extract metadata
>> > OUTPUT: [DEBUG : MetExtractorNGS - 2016/04/02 22:12:18] - Accessing
>> > NGS server at
>> > https://urldefense.proofpoint.com/v2/url?u=http-3A__192.168.8.44-3A8
>> > 08
>> > 2_RPC2&d=CwIBaQ&c=CZZujK3G2KuXGEKnzq-Hrg&r=wndYZ4MLMT9l3Zb2WZv2hq2O6
>> > yv
>> > Z1Cs-T2gHY95y7ZA&m=Qaz0eKz7FHe35NMF43A17ey59ANhAqJD5ZfwZQC0VRo&s=tSc
>> > i2 Q1bJj0cQnBHjjOwtZjjx9uNMoN5Bi-ABG0Q7Y4&e=
>> > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - addMetadata:
>> > metadata for file_host are not in array format.Converting..
>> > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - addMetadata:
>> > adding key/value [file_host]/[ip-192-168-8-66]
>> > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - addMetadata:
>> > metadata for ProductType are not in array format.Converting..
>> > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - addMetadata:
>> > adding key/value [ProductType]/[GenericFile]
>> > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - addMetadata:
>> > metadata for ingest_user are not in array format.Converting..
>> > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - addMetadata:
>> > adding key/value [ingest_user]/[kmavrommatis]
>> > OUTPUT: [DEBUG : MetExtractorNGS - 2016/04/02 22:12:18] - The file
>> > path is ARRAY(0x22d3f48). It will be added under the FilePath
>> > metadata field
>> > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - addMetadata:
>> > metadata for FilePath are not in array format.Converting..
>> > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - addMetadata:
>> > adding key/value
>> > [FilePath]/[/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Se
>> > q/
>> > RawData/fastq/test]
>> > OUTPUT: [DEBUG : MetExtractorNGS - 2016/04/02 22:12:18] - This file
>> > is of type text
>> > OUTPUT: [DEBUG : MetExtractorNGS - 2016/04/02 22:12:18] - Storing
>> > metadata in file
>> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
>> > st
>> > q/test.met
>> > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - Changing
>> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
>> > st
>> > q/test
>> > to
>> > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] -
>> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
>> > st
>> > q/test
>> > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - Changing
>> > kmavrommatis to
>> > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] -
>> > kmavrommatis
>> > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - Changing
>> > GenericFile to
>> > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] -
>> > GenericFile
>> > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - Changing
>> > ip-192-168-8-66 to
>> > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] -
>> > ip-192-168-8-66
>> > OUTPUT: [DEBUG : MetExtractorNGS - 2016/04/02 22:12:19] - Process
>> > finished SUCCESSFULLY Apr 02, 2016 10:12:19 PM
>> > org.apache.oodt.cas.metadata.extractors.ExternMetExtractor
>> > extrMetadata
>> > INFO: Met extraction successful for product file:
>> > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f
>> > as
>> > tq/test] Apr 02, 2016 10:12:19 PM
>> > org.apache.oodt.cas.crawl.ProductCrawler ingest
>> > INFO: ProductCrawler: Ready to ingest product:
>> >
>> [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fastq/test]:
>> > ProductType: [GenericFile]
>> > Apr 02, 2016 10:12:19 PM
>> > org.apache.oodt.cas.filemgr.ingest.StdIngester
>> > setFileManager
>> > INFO: StdIngester: connected to file manager:
>> > [https://urldefense.proofpoint.com/v2/url?u=http-3A__192.168.8.44-3A
>> > 90
>> > 00&d=CwIBaQ&c=CZZujK3G2KuXGEKnzq-Hrg&r=wndYZ4MLMT9l3Zb2WZv2hq2O6yvZ1
>> > Cs
>> > -T2gHY95y7ZA&m=Qaz0eKz7FHe35NMF43A17ey59ANhAqJD5ZfwZQC0VRo&s=OvpwZVR
>> > 1X qgKclL83VXAWh__c7nz87xK_nS-O7hIXqc&e= ] Apr 02, 2016 10:12:19 PM
>> > org.apache.oodt.cas.filemgr.datatransfer.InPlaceDataTransferer
>> > setFileManagerUrl
>> > INFO: In Place Data Transfer to:
>> > [https://urldefense.proofpoint.com/v2/url?u=http-3A__192.168.8.44-3A
>> > 90
>> > 00&d=CwIBaQ&c=CZZujK3G2KuXGEKnzq-Hrg&r=wndYZ4MLMT9l3Zb2WZv2hq2O6yvZ1
>> > Cs
>> > -T2gHY95y7ZA&m=Qaz0eKz7FHe35NMF43A17ey59ANhAqJD5ZfwZQC0VRo&s=OvpwZVR
>> > 1X qgKclL83VXAWh__c7nz87xK_nS-O7hIXqc&e= ] enabled Apr 02, 2016
>> > 10:12:19 PM org.apache.oodt.cas.filemgr.ingest.StdIngester
>> > ingest
>> > INFO: StdIngester: ingesting product: ProductName: [test]: ProductType:
>> > [GenericFile]: FileLocation:
>> > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f
>> > as
>> > tq/]
>> > Apr 02, 2016 10:12:19 PM
>> > org.apache.oodt.cas.filemgr.system.XmlRpcFileManagerClient
>> > ingestProduct
>> > FINEST: File Manager Client: clientTransfer enabled: transfering
>> > product [test] Apr 02, 2016 10:12:19 PM
>> > org.apache.oodt.cas.filemgr.versioning.VersioningUtils
>> > createBasicDataStoreRefsFlat
>> > FINE: VersioningUtils: Generated data store ref:
>> > file:/opt/oodt/data/archive/test/test from origRef:
>> > file:/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawDa
>> > ta /fastq/test Apr 02, 2016 10:12:19 PM
>> > org.apache.oodt.cas.crawl.ProductCrawler ingest
>> > INFO: Successfully ingested product:
>> >
>> [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fastq/test]:
>> > product id: 4c8de2da-265a-48c4-8380-3f1103dfecfc
>> > Apr 02, 2016 10:12:19 PM org.apache.oodt.cas.crawl.ProductCrawler
>> > handleFile
>> > INFO: Successful ingest of product:
>> > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f
>> > as
>> > tq/test]
>> >
>> >
>> > *********************************************************
>> > THIS ELECTRONIC MAIL MESSAGE AND ANY ATTACHMENT IS CONFIDENTIAL AND
>> > MAY CONTAIN LEGALLY PRIVILEGED INFORMATION INTENDED ONLY FOR THE USE
>> > OF THE INDIVIDUAL OR INDIVIDUALS NAMED ABOVE.
>> > If the reader is not the intended recipient, or the employee or
>> > agent responsible to deliver it to the intended recipient, you are
>> > hereby notified that any dissemination, distribution or copying of
>> > this communication is strictly prohibited. If you have received this
>> > communication in error, please reply to the sender to notify us of
>> > the error and delete the original message. Thank You.
>> >
>>
>>
>> --
>> *Lewis*
>>
>> *********************************************************
>> THIS ELECTRONIC MAIL MESSAGE AND ANY ATTACHMENT IS CONFIDENTIAL AND
>> MAY CONTAIN LEGALLY PRIVILEGED INFORMATION INTENDED ONLY FOR THE USE
>> OF THE INDIVIDUAL OR INDIVIDUALS NAMED ABOVE.
>> If the reader is not the intended recipient, or the employee or agent
>> responsible to deliver it to the intended recipient, you are hereby
>> notified that any dissemination, distribution or copying of this
>> communication is strictly prohibited. If you have received this
>> communication in error, please reply to the sender to notify us of the
>> error and delete the original message. Thank You.
>>
>
>
>
>--
>*Lewis*
>*********************************************************
>THIS ELECTRONIC MAIL MESSAGE AND ANY ATTACHMENT IS
>CONFIDENTIAL AND MAY CONTAIN LEGALLY PRIVILEGED
>INFORMATION INTENDED ONLY FOR THE USE OF THE INDIVIDUAL
>OR INDIVIDUALS NAMED ABOVE.
>If the reader is not the intended recipient, or the
>employee or agent responsible to deliver it to the
>intended recipient, you are hereby notified that any
>dissemination, distribution or copying of this
>communication is strictly prohibited. If you have
>received this communication in error, please reply to the
>sender to notify us of the error and delete the original
>message. Thank You.


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message