jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rajesh Upadhyay (JIRA)" <j...@apache.org>
Subject [jira] Created: (JCR-1894) Word doc extraction problem
Date Wed, 03 Dec 2008 10:29:44 GMT
Word doc extraction problem

                 Key: JCR-1894
                 URL: https://issues.apache.org/jira/browse/JCR-1894
             Project: Jackrabbit
          Issue Type: Bug
          Components: jackrabbit-text-extractors
    Affects Versions: core 1.4.3
         Environment: OS: Windows 2003 sp2 My-eclipse6.0 / tomcat 5.5 and Athelon500+
            Reporter: Rajesh Upadhyay

I have a .doc file which contains data inside a table. Now i want to parse the table to get
the table values. Normal Parsing is not working for table( I mean using String tokenizer)
because it is giving some unwanted special characters while parsing the table. So I just want
to convert that .doc to .txt file, then only it is easy to split the values. But i can't make
it! Can any one please tell me how to parse a MS WORD TABLE Values?

We need to know the process by which we can index a doc file excluding special characters,
When we will show the excerpt then these special characters make it unreadable.

Thanks in advance.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message