cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stephan Michels <step...@vern.chem.tu-berlin.de>
Subject Re: text parser
Date Wed, 13 Feb 2002 11:18:34 GMT


On Wed, 13 Feb 2002, Andrew Answer wrote:

> Hello Stephan,
>
>   is a good idea! Now i converting many text documents to XML by using
>   PHP scripts offline...
>   Some names for your parser: txt2xml (simply and clear),

There exists already a project this this name:
http://xml.gsfc.nasa.gov/ingest_demo/txt2XML.html

>   JTF (Java Text Formatter),

Look an JTF.org:Jewish Task Force ;-)

>   JTC (Java Text Converter).

http://www.jtc.com/ is also given

Finding a name isn't so easy as I think. :(

>   Also look at the APTConvert
>   (http://www.xmlmind.com/aptconvert/distrib/docs/userguidetoc.html),
>   may be this tool can help you.

I think my project could help you.

A example grammar looks like:
<grammar>
 <tokens>

  <token tsymbol="id">
   <concat>
    <cc><ci min="A" max="Z"/><ci min="a" max="z"/></cc>
    <cc minOccurs="0" maxOccurs="*">
     <ci min="A" max="Z"/><ci min="a" max="z"/><ci min="0" max="9"/>
     <cs content="_"/>
    </cc>
   </concat>
  </token>

  <token tsymbol="mult" assoc="right">
   <string content="*"/>
  </token>

  <token tsymbol="plus" assoc="left">
   <string content="+"/>
  </token>

  <token tsymbol="dopen">
   <string content="("/>
  </token>

  <token tsymbol="dclose">
   <string content=")"/>
  </token>

 </tokens>

 <whitespace>
  <cc maxOccurs="*"><cs content="&#10;&#13;&#9;&#32;"/></cc>
 </whitespace>

 <productions>

  <production ntsymbol="E">
   <ntsymbol name="E"/><tsymbol name="plus"/><ntsymbol name="E"/>
  </production>

  <production ntsymbol="E">
   <ntsymbol name="E"/><tsymbol name="mult"/><ntsymbol name="E"/>
  </production>

  <production ntsymbol="E">
   <tsymbol name="dopen"/><ntsymbol name="E"/><tsymbol name="dclose"/>
  </production>

  <production ntsymbol="E">
   <tsymbol name="id"/>
  </production>

 </productions>

 <ssymbol ntsymbol="E"/>
</grammar>

This grammar converts the string "A*b+c*D+(e+F)*G" to

<E>
 <E>
  <E>
   <E>
    <id>A</id>
   </E>
   <mult>*</mult>
   <E>
    <id>b</id>
   </E>
  </E>
  <plus>+</plus>
  <E>
   <E>
    <id>c</id>
   </E>
   <mult>*</mult>
   <E>
    <id>D</id>
   </E>
  </E>
 </E>
 <plus>+</plus>
 <E>
  <E>
   <dopen>(</dopen>
   <E>
    <E>
     <id>e</id>
    </E>
    <plus>+</plus>
    <E>
     <id>F</id>
    </E>
   </E>
   <dclose>)</dclose>
  </E>
  <mult>*</mult>
  <E>
   <id>G</id>
  </E>
 </E>
</E>


---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
For additional commands, email: cocoon-dev-help@xml.apache.org


Mime
View raw message