opennlp-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jörn Kottmann (JIRA) <>
Subject [jira] Commented: (OPENNLP-53) Parser should have simple interface to process a tokenized input sentence
Date Tue, 18 Jan 2011 22:36:43 GMT


Jörn Kottmann commented on OPENNLP-53:

The Parse object has a text field of type String and the span field contains a Span object
which contains the character offset and character length of the parse.

Would it be possible to replace this text String with an String array which contains the individual
tokens ?
The replaced text String could be created from the String array to maintain backward compatibility
the next few releases.

> Parser should have simple interface to process a tokenized input sentence
> -------------------------------------------------------------------------
>                 Key: OPENNLP-53
>                 URL:
>             Project: OpenNLP
>          Issue Type: Improvement
>          Components: Parser
>            Reporter: Jörn Kottmann
> The parser expects a tokenized sentence as input, but currently it must be converted
to a string where each
> token is separated by a white space.
> This interface turned out to be inconvenient if the input if the input sentence is
> provided as a list of strings or a string with a token span list. In both case
> a new string must be created. In this new string the offsets of the individual tokens
> must be remember in order to retrieve the parse tree out of the Parse objects.
> Create a more convenient way of interacting with an already tokenized sentence which
> is not in a whitespace separated format. 

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message