forrest-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ross Gardler <>
Subject Forrest-Voice proposal
Date Tue, 05 Jul 2005 11:11:22 GMT
Copied below is HANAX's proposal for the Google Summer of Code 
programme. As I said elsewhere, it is a real shame that a proposal of 
such quality had to be rejected due to the small (but very generous) 
number of awards available to Apache.

I am thrilled to see that Hanax is here to help us implement this 
plugin, as you will see from his proposal, whilst he has little 
experience of Forrest he does have experience of VoiceXML.


   Apache Forrest is a publishing framework that transforms
   input from various sources into a unified presentation in
   one or more output formats. At present there are several
   output formats that Forrest is capable of producing.  Some
   people are not able to access web content. Forrest is
   about publishing content in many formats so it makes sense
   for Forrest to allow differently abled people to have
   access. In addition, visually impaired people are unable
   to efficiently access not only Forrest based content but
   content developed in other document formats, such as
   MS Word, Open Office, Docbook, HTML etc. Since Forrest is
   able to accept a wide range of input formats
   this project will help address this additional need.
   I'd like to add voice accessibility, so that Forrest:
   1. will be able to read content via a voice synthesiser.
      Reading should be clear, contrete and without redundant
      informations (all metainformations (like bold text)
      should be done by prosody).
   2. will have capability to access content by speech.
      Navigation should be intuitive.

Achieving the goal
   Using X+V technology. This stands for XHTML + Voice which
   allows one to create web-based voice-controlled
   applications with voice output. The advantage of this
   technology is a quite straightforward mapping of visual
   elements (document sections, TOC...) to audio input/output
   structures (VoiceXML).
   Apache Forrest excels in transforming various input
   source into various output sources.  This project will
   extend Forrest to allow it to automatically produce an
   X+V document. X+V defines a  sophisticated system of
   structures that can be used to separate individual
   semantic blocks (paragraphs, section...). These
   structures map well to the existing internal document
   format of Apache Forrest and hence any Apache Forrest
   content will be capable of being rendered via TTS
   (text-to-speech) engine.  Similarly, Forrests internal
   structures for site navigation can be used to create X+V
   menus for voice control. This will result in the relevant
   portion of the "content being either read by the text to
   speech engine or displayed on the browser, as appropriate
   for the individual user. In addition to producing X+V
   document we need to also produce grammars for recognition.
   We will need at least one global grammar for navigation
   and quick access(e.g. 'go to section 4', 'go to menu').

   The succesful completion of this project will result in
   an increased level of accesibility for visually impaired
   and physically challenged to Apache Forrest produced
   content. It is important to realise that Forrest is used
   in projects such as Burrokeet which produce learning
   objects and is also used to produce documentation for
   a wide range of projects, including many Apache projects.
   Therefore, the addition of this plugin will extend
   the accessibility to the outputs of those projects.
   Of longer term importance is the fact that Forrest can
   accept documents in a wide range of input formats,
   therefore the creation of this plugin will facilitate
   the creation of a tool enabling almost any  document to
   be made accessible.

My approach, milestones
1. Familiarization.
    Get familiar with basics how Forrest works.
    Which formats are supported and how the final structure
    of document is created.
2. Research.
    This research should answer these questions
a) How to interpret Forrest content features such as menus,
    navigation  bars and lists of sections as X+V structures
    (menus, fields...).
b) In order to easily access content via voice command we
    need to devise an intuitive naviagtion system. Unlike
    visual models, where can people access data "randomly",
    in audio models we will have strict sequential access to
    the document. That is, the reader cannot know what is at
    end of page until they hear it.  But we can predefine
    some bookmarks and let the user skip to them. This is
    flow control - some kind of virtual cursor. The main
    challenge will be develop  an intuitive mechanism to
    navigate the document. I think that navigation is the
    main problem in structured sites accesed via voice.
    Reading of content is quite straightforward thanks to
    TTS engine.
    In this phase I'll try to make first draft of flow
    control - how many "navigation chunks" will be used -
    is document one big chunk or is there any way to separate
    it into several  smaller ones and navigate between them
    via "goto" jumps?
    Next question is to determine, which fields will use
    global grammar, which helps to make some special keywords
    be global and used anytime in the navigation.
c) How can semantics and events help. Using well designed
    semantic tags will improve navigation logic. Also smart
    use of events can be helpful to reach goal more
3. Implementation.
    a) Basic content - separation to sections, marking,
       navigation within it.
    b) Menus and navigation bar - mapping to sections, making
       shortcuts - probably user doesn't  want to say whole
       title, but only specific abbr or section number or
       whatever, but short.
    c) List of links, connection between documents
4. Optimizing
    While X+V document will be automatically generated,
    it will need some optimization:
    - Joining duplicate code.
    - Optimizing semantics.
5. Documentation
    Describing functionality, how to use,
    creating some samples...

My skills
- I'm currently implementing my diploma thesis which is
   completely based on X+V. I use static page and make
   changes via JavaScript, which can work with VoiceXML
   variables in quite comfortable way. With some "tricks"
   this technology can be used not in "transaction" way as
   it is usually presented (ordering by phone), but also in
   quite interactive way. This "tricks" means using events
   and forcing to revisit fields by clearing them accordingly
   to user input. This makes applicationloop live forever and
   gives user freedom in "what to say next" way.
   After my experiences I believe that even structured site
   can be transformed into voice-readed with quite intuitive
   I can rank here myself as advanced.
- JSGF (java speech grammar format). I prefer it against
   SRGS (XML based one) because it seems  more readable.
   Also files are reauseable in some (java) speech
   recognition engines. I'm also familiar how to work with
   semantic tags in JSGF, which has very important part in
   joining grammars with document.
   I can rank here myself as advanced.
- I have quite good experiences with XSLT (I translated
   documents in several formats into XSL:FO). My work was
   concerned on creating exact visual copy of original
   (with XML mapping). Knowledge of XSL will be useful
   because much of the internal processing of Forrest is done
   by a chain of XSL templates.
   I can rank here myself as intermediate.
- I have solid knowledge and skills in Java, about 2 years
   of practical use, mainly writing objects for backend
   processing (not GUI). But I don't know much about related
   technologies (like servlets, applets...)
   I'm familiar with Eclipse, CVS and UML (as important part
   of team work, I hope)
   In pure java I can rank myself as advanced.
   In JAVA-related technologies it's beginner.
   I have discussed the extent of my Java knowledge with the
   project  mentor who believes my skills are adequate.

View raw message