hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Håvard Wahl Kongsgård <haavard.kongsga...@gmail.com>
Subject Re: Reading multiple lines from a microsoft doc in hadoop
Date Fri, 24 Aug 2012 06:07:39 GMT
It's much easier if you convert the documents to text first


or some other doc parser


On Fri, Aug 24, 2012 at 7:52 AM, Siddharth Tiwari
<siddharth.tiwari@live.com> wrote:
> hi,
> I have doc files in msword doc and docx format. These have entries which are
> seperated by an empty line. Is it possible for me to read
> these lines separated from empty lines at a time. Also which inpurformat
> shall I use to read doc docx. Please help
> *------------------------*
> Cheers !!!
> Siddharth Tiwari
> Have a refreshing day !!!
> "Every duty is holy, and devotion to duty is the highest form of worship of
> God.”
> "Maybe other people will try to limit me but I don't limit myself"

Håvard Wahl Kongsgård
Faculty of Medicine &
Department of Mathematical Sciences


View raw message