lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nader S. Henein" <...@bayt.net>
Subject RE: converting text/doc to XML
Date Tue, 08 Jul 2003 05:54:46 GMT
XML is an organized, standardized format so let's say your document has
the following characteristics

File name : foobar.doc
Firt line title : Foo Bar
File content :
	Blah blah blah blah 
	Blah blah blah blah 
	Blah blah blah blah 
	Blah blah blah blah 

Then you have to read the file ( simple file read, java can do this in
about ten different ways, pick one )
But each of the files characteristincs in a variable

And then parse it in a valid XML:
<doc doc_id=1>
	<file_name>foobar.doc</file_name>
	<title>Foo Bar</title>
	<content>
	Blah blah blah blah 
	Blah blah blah blah 
	Blah blah blah blah 
	Blah blah blah blah 
	</content>
</doc>


There are probably packages that will do this for you but it's so simple
you could pull it off in under a hundred lines, it's also good exercise
to familiarize yourself with XML (if you haven't played around with it
before)



-----Original Message-----
From: Jagdip Singh [mailto:jxs1878@cs.rit.edu] 
Sent: Tuesday, July 08, 2003 9:41 AM
To: 'Lucene Users List'
Subject: converting text/doc to XML


Hi,
How can I convert text/doc to XML?
Please help.
 
Regards, 
Jagdip


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message