oodt-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mattmann, Chris A (3980)" <chris.a.mattm...@jpl.nasa.gov>
Subject Re: TikaCmdLineMetExtractor does not generate .met file
Date Tue, 04 Nov 2014 04:35:37 GMT
Hi Zichen,

Answers below:



-----Original Message-----
From: Zichen Nie <zichennie@gmail.com>
Date: Monday, November 3, 2014 at 10:32 AM
To: Chris Mattmann <Chris.A.Mattmann@jpl.nasa.gov>
Subject: Re: TikaCmdLineMetExtractor does not generate .met file

>Yes. That's totally make sense! But here comes the question,
>1. Do we have to generate .met file for our homework 2? How are we make
>use of .met files?

You only have to generate .met files if you need them to crawl - if you
are using MetExtractorProductCrawler then no you don’t need met file.

>2. How to customize the metadata files we want to create? In the cas-pge
>example, I don't understand why there are key and value pairs like "less
>than" and "%3C" in the final met file. The corresponding metadata
>configuration specifies:<customMetadata><metadata
> key="LessThan" val="&#x3C;"/>...<customMetadata>. But if I add some
>other key on my own, it fails to show in the final met file.

Please read:

https://cwiki.apache.org/confluence/display/OODT/Understanding+the+flow+of+
Metadata+during+PGE+based+Processing

https://cwiki.apache.org/confluence/display/OODT/Understanding+CAS-PGE+Meta
data+Precendence



> 
>3. When I tried to create my own workflow using cas-pge, for example,
>just copy a file to another place, the terminal said "ingestion is
>failed" because of "missing required metadata", however, in my
>destination folder the new file is copied and .met is
> generated. There is only one key value pairs in my .met which is JobID.
>I am really confused.

requiredMetadata is defined in the workflow manager tasks.xml on a per
task basis. Please refer to the requiredMetadata for your task and confirm
that you provided it.

>
>
>I must be missing something in the configuration process, the
>message"missing required metadata" I saw a lot of times and even through
>successful ingestions. Any suggestions?

See above.

Cheers,
Chris

>
>
>Best,
>Zichen 

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattmann@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++



>
>
>
>
>2014-11-02 19:31 GMT-08:00 Mattmann, Chris A (3980)
><chris.a.mattmann@jpl.nasa.gov>:
>
>Hi Zichen,
>
>Thanks for your mail. If you use MetExtractorProductCrawler, met
>is generated, but it¹s never serialized to disk. I think that explains
>it. Let me know if that makes sense.
>
>Cheers,
>Chris
>
>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>Chris Mattmann, Ph.D.
>Chief Architect
>Instrument Software and Science Data Systems Section (398)
>NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>Office: 168-519, Mailstop: 168-527
>Email: chris.a.mattmann@nasa.gov
>WWW:  http://sunset.usc.edu/~mattmann/
>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>Adjunct Associate Professor, Computer Science Department
>University of Southern California, Los Angeles, CA 90089 USA
>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>
>
>
>
>
>-----Original Message-----
>From: Zichen Nie <zichennie@gmail.com>
>Date: Saturday, November 1, 2014 at 4:47 PM
>To: Chris Mattmann <Chris.A.Mattmann@jpl.nasa.gov>
>Subject: TikaCmdLineMetExtractor does not generate .met file
>
>>Dear Professor:
>>
>>I followed the instruction on how to use OODT cas-crawler, and tried to
>>generate .met file using TikaCmdLineExtractor.
>>I can see from the log that Tika is extracting my metadata but it does
>>not generate .met file for my json file.
>>
>>Here is my command line:
>>
>>
>>./crawler_launcher --operation --launchMetCrawler -filemgrUrl
>>http://localhost:9000 <http://localhost:9000> --clientTransferer
>>org.apache.oodt.cas.filemgr.datatransfer.LocalDataTransferFactory
>>--productPath
>>/Users/threeears/Documents/572/Assignment2/oodt-deploy/cas-crawler-0.7/da
>>t
>>a/test/0.json --metExtractor
>>org.apache.oodt.cas.metadata.extractors.TikaCmdLineMetExtractor
>> --metExtractorConfig
>>/Users/threeears/Documents/572/Assignment2/oodt-deploy/cas-crawler-0.7/ex
>>t
>>ractors/tikaextractor/tikaextractor.config --metFileExtension met
>>
>>
>>
>>I thought MetCrawler should generate meta file before ingestion, it's
>>weird that my ingestion is successful and met file is not shown.  Am I
>>using the right extractor and crawler? Are there any necessary
>>configurations that I missed?
>>
>>
>>Best,
>>Zichen
>>
>
>
>
>
>
>
>
>

Mime
View raw message