stanbol-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <arthi.ven...@wipro.com>
Subject RE: Exception while installing metaxa
Date Mon, 16 Sep 2013 05:39:29 GMT
Thanks a lot Rupert,
 Adding -binary to the call worked like a charm.
Now Iam able to enhance both pdf as well as word documents.

Thanking you and Rgds,
Arthi


-----Original Message-----
From: Rupert Westenthaler [mailto:rupert.westenthaler@gmail.com] 
Sent: Monday, September 16, 2013 10:56 AM
To: dev@stanbol.apache.org
Subject: Re: Exception while installing metaxa

Hi Arthi,

I should have noticed this in my first response. When sending binary document you need to
use the "--data-binary @{file}" instad of "--data" as "--data" is a shorthand for "--data-ascii".
I made a short test on

    http://dev.iks-project.eu:8081/enhancer/chain/dbpedia-proper-noun

with "--data" and a PDF document and got an empty response. WIth "--data-binary" I got the
expected results.

If this does not solve your problems you should try to remove the "optional" flag from the
"tika" engine in your chain, because this would cause the enhancement process to fail in such
cases. If "tika"
is marked as optional errors are only logged and the processing is continued.

We had some some issues with the Tika engine related to XML based office documents (e.g. STANBOL-810,
STANBOL-970) but as PDF files do also not work for you I expect that your issues are caused
by something different.

Feel free to test also on the dev.iks-project.eu server. e.g.

    http://dev.iks-project.eu:8081/enhancer
    http://dev.iks-project.eu:8081/enhancer/chain/dbpedia-proper-noun

best
Rupert

On Sat, Sep 14, 2013 at 11:36 AM,  <arthi.venkat@wipro.com> wrote:
> Hi Rupert,
>   Tried  the different mime types but with no luck.
> Same call on a plain text  or just data works fine.
> For example below command returns the enhancements
>
> curl -X POST -H "Accept: text/turtle" -H "Content-type: text/plain" 
> --data @FileToAugment.txt 
> http://localhost:8080/enhancer/chain/MyCustomChain1
>
> However none of the below three commands give a response.
> curl -X POST -H "Accept: text/turtle" -H "Content-type: application/msword" --data @TextToEnhance97ver.doc
"http://localhost:8080/enhancer/chain/MyCustomChain1"
>
> curl -X POST -H "Accept: text/turtle" -H "Content-type: application/vnd.openxmlformats-officedocument.wordprocessingml.document"
--data @TextToEnhance.docx "http://localhost:8080/enhancer/chain/MyCustomChain1"
>
> curl -X POST -H "Accept: text/turtle" -H "Content-type: application/pdf" --data @Testpdf.pdf
"http://localhost:8080/enhancer/chain/MyCustomChain1"
>
> Please do share any pointers which you have on this.
> Note : MyCustomChain1  has below configuration :
>
>     tika ( optional , TikaEngine)
>     langdetect ( required , LanguageDetectionEnhancementEngine)
>     opennlp-sentence ( required , OpenNlpSentenceDetectionEngine)
>     opennlp-token ( required , OpenNlpTokenizerEngine)
>     opennlp-pos ( required , OpenNlpPosTaggingEngine)
>     opennlp-chunker ( required , OpenNlpChunkingEngine)
>     MyLinkingEngine ( required , EntityLinkingEngine)
>
>
> Thanking you and Regards,
> Arthi
>
>
>
>
> -----Original Message-----
> From: arthi venkataraman (WT01 - CTO Office)
> Sent: Saturday, September 14, 2013 12:49 PM
> To: dev@stanbol.apache.org
> Subject: RE: Exception while installing metaxa
>
> Thanks a lot Rupert
> I will check the Content type  and re-try.
>
> Thanks and Rgds,
> Arthi
>
>
> -----Original Message-----
> From: Rupert Westenthaler [mailto:rupert.westenthaler@gmail.com]
> Sent: Saturday, September 14, 2013 12:46 PM
> To: dev@stanbol.apache.org
> Subject: Re: Exception while installing metaxa
>
> On Sat, Sep 14, 2013 at 9:05 AM,  <arthi.venkat@wipro.com> wrote:
>> Thanks a lot Rupert for response.
>> The reason for using Metaxa is that I want to enhance pdf and word documents using
Stanbol.
>> I read that to process pdf and word we would need metaxa in the pipeline.
>
> The Tika Engine is also able to process Microsoft Word and PDF document. Just have a
look at the supported media types of Apache Tika.
>
>>
>> In the contenhub ui of Stanbol Iam able to attach a pdf / word doc and enhance this.
>> However when I try the same from the command line using curl it fails.
>>
>> Any idea how I could use Stanbol to enhance a word / pdf file from command line or
alternately a simple Java program.
>>
>> Tried below calls but none of them work curl -i -X POST -H 
>> "Content-Type:text/plain" --data @TextToEnhance.docx "
>> http://localhost:8080/contenthub/contenthub/store?uri=urn:my-content-
>> i tem2&chain=MyCustomChain1"  -u admin:admin
>>
>>  curl -i -X POST -H "Content-Type:application/word" --data   @TextToEnhance.docx
" http://localhost:8080/contenthub/contenthub/store?uri=urn:my-content-item2&chain=MyCustomChain1"
-u admin:admin"
>>
>
> I need to go offline and do not have time to validate this answer, but 
> IMO this fails because the content type for docx is not 
> application/word. See [1] for a list of Content-Types for the new XML 
> based MS office formats
>
> best
> Rupert
>
> [1] 
> http://stackoverflow.com/questions/4212861/what-is-a-correct-mime-type
> -for-docx-pptx-etc
>
>>
>> Thanks and Rgds,
>> Arthi
>>
>>
>> -----Original Message-----
>> From: Rupert Westenthaler [mailto:rupert.westenthaler@gmail.com]
>> Sent: Saturday, September 14, 2013 12:27 PM
>> To: dev@stanbol.apache.org
>> Subject: Re: Exception while installing metaxa
>>
>> Hi Arthi,
>>
>> I have had not use Metaxa for a while. Typically you should use the Tika engine [1]
(based on Apache Tika) for processing non plain text documents.
>>
>> To use it (with the default configuration) it is usually sufficient to include "tika"
in your enhancement engine. If you are configuring a ListChain you will need to have the "tika"
engine in the first place.
>> In case of a WightedChain ordering in the config does not matter.
>>
>> best
>> Rupert
>>
>>
>> [1]
>> http://stanbol.apache.org/docs/trunk/components/enhancer/engines/tika
>> e
>> ngine
>>
>> On Fri, Sep 13, 2013 at 1:45 PM,  <arthi.venkat@wipro.com> wrote:
>>> Hi,
>>>    Iam trying to installing the metaxa bundle.
>>>
>>> I did a mvn clean install in the stanbol\enhancement-engines\metaxa directory.
>>> From  the http://localhost:8080/system/console/bundles menu I installed the metaxa
jar.
>>>
>>> I got the below exception in the Stanbol window.   Any idea  how this issue can
be fixed?
>>>
>>> ERROR: Bundle org.apache.stanbol.enhancer.engines.metaxa [253]: 
>>> Error starting/s topping bundle. (org.osgi.framework.BundleException:
>>> Unresolved constraint in bu ndle org.apache.stanbol.enhancer.engines.metaxa [253]:
Unable to resolve 253.0:
>>> missing requirement [253.0] package; 
>>> (package=javax.microedition.io))
>>> org.osgi.framework.BundleException: Unresolved constraint in bundle 
>>> org.apache.s tanbol.enhancer.engines.metaxa [253]: Unable to resolve
>>> 253.0: missing requireme nt [253.0] package; (package=javax.microedition.io)
>>>         at org.apache.felix.framework.Felix.resolveBundle(Felix.java:3443)
>>>         at org.apache.felix.framework.Felix.startBundle(Felix.java:1727)
>>>         at
>>> org.apache.felix.framework.Felix.setBundleStartLevel(Felix.java:1333
>>> )
>>>
>>>         at
>>> org.apache.felix.framework.StartLevelImpl.run(StartLevelImpl.java:27
>>> 0
>>> )
>>>         at java.lang.Thread.run(Unknown Source)
>>>
>>>
>>>
>>> Thanks and Rgds,
>>> Arthi
>>>
>>>
>>> Please do not print this email unless it is absolutely necessary.
>>>
>>> The information contained in this electronic message and any attachments to this
message are intended for the exclusive use of the addressee(s) and may contain proprietary,
confidential or privileged information. If you are not the intended recipient, you should
not disseminate, distribute or copy this e-mail. Please notify the sender immediately and
destroy all copies of this message and any attachments.
>>>
>>> WARNING: Computer viruses can be transmitted via email. The recipient should
check this email and any attachments for the presence of viruses. The company accepts no liability
for any damage caused by any virus transmitted by this email.
>>>
>>> www.wipro.com
>>
>>
>>
>> --
>> | Rupert Westenthaler             rupert.westenthaler@gmail.com
>> | Bodenlehenstra├če 11                             ++43-699-11108907
>> | A-5500 Bischofshofen
>>
>> Please do not print this email unless it is absolutely necessary.
>>
>> The information contained in this electronic message and any attachments to this
message are intended for the exclusive use of the addressee(s) and may contain proprietary,
confidential or privileged information. If you are not the intended recipient, you should
not disseminate, distribute or copy this e-mail. Please notify the sender immediately and
destroy all copies of this message and any attachments.
>>
>> WARNING: Computer viruses can be transmitted via email. The recipient should check
this email and any attachments for the presence of viruses. The company accepts no liability
for any damage caused by any virus transmitted by this email.
>>
>> www.wipro.com
>
>
>
> --
> | Rupert Westenthaler             rupert.westenthaler@gmail.com
> | Bodenlehenstra├če 11                             ++43-699-11108907
> | A-5500 Bischofshofen
>
> Please do not print this email unless it is absolutely necessary.
>
> The information contained in this electronic message and any attachments to this message
are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential
or privileged information. If you are not the intended recipient, you should not disseminate,
distribute or copy this e-mail. Please notify the sender immediately and destroy all copies
of this message and any attachments.
>
> WARNING: Computer viruses can be transmitted via email. The recipient should check this
email and any attachments for the presence of viruses. The company accepts no liability for
any damage caused by any virus transmitted by this email.
>
> www.wipro.com



-- 
| Rupert Westenthaler             rupert.westenthaler@gmail.com
| Bodenlehenstra├če 11                             ++43-699-11108907
| A-5500 Bischofshofen

Please do not print this email unless it is absolutely necessary. 

The information contained in this electronic message and any attachments to this message are
intended for the exclusive use of the addressee(s) and may contain proprietary, confidential
or privileged information. If you are not the intended recipient, you should not disseminate,
distribute or copy this e-mail. Please notify the sender immediately and destroy all copies
of this message and any attachments. 

WARNING: Computer viruses can be transmitted via email. The recipient should check this email
and any attachments for the presence of viruses. The company accepts no liability for any
damage caused by any virus transmitted by this email. 

www.wipro.com
Mime
View raw message