opennlp-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From William Colen <william.co...@gmail.com>
Subject Re: Re: English 300k sentences Leipzig Corpus for test
Date Thu, 14 Mar 2013 14:45:27 GMT
Hi,

I could not find a way to convert from Leipzig to other formats than DocCat
sample. Is it possible to convert from Leipzig to SentenceSample using the
OpenNLP tools?

Thank you,
William


On Thu, Mar 14, 2013 at 9:51 AM, Jörn Kottmann <kottmann@gmail.com> wrote:

>
>
>
> -------- Original Message --------
> Subject:        Re: English 300k sentences Leipzig Corpus for test
> Date:   Thu, 14 Mar 2013 09:48:21 -0300
> From:   William Colen <william.colen@gmail.com>
> To:     Jörn Kottmann <kottmann@gmail.com>
>
>
>
> Yes, you can forward.
>
> It is not clear to me how to convert it. I could only find converters from
> Leipzig to DocCat.
>
>
> On Thu, Mar 14, 2013 at 6:09 AM, Jörn Kottmann <kottmann@gmail.com> wrote:
>
>  Do you mind if I forward this to the dev list?
>>
>> Yes, you need to convert the data into input data. The idea
>> is that we process the data with 1.5.2 and 1.5.3 and see if the output
>> is still identical, if its not identical its either a change in our code
>> or a bug.
>>
>> It doesn't really matter which file you download as long as it has enough
>> sentences,
>> would be nice if you can note in the test plan which one you used.
>>
>> Hopefully I will have sometime over the weekend to do the tests on the
>> private data I have.
>>
>> Jörn
>>
>>
>> On 03/13/2013 11:38 PM, William Colen wrote:
>>
>>  Hi, Jörn,
>>>
>>> I would like to start testing with Leipzig Corpus. Do you know how the
>>> steps to do it?
>>>
>>> I downloaded the file named
>>> eng_news_2010_300K-text.tar.****gz<file:///Users/wcolen/**
>>> Desktop/opennlp1.5.3/eng_news_****2010_300K-text.tar.gz>,
>>>
>>>
>>> and now I would use the converter to extract documents from it.
>>>
>>> After that, I would try to use the output of a module as input to the
>>> next.
>>> Is it correct?
>>>
>>> Thank you,
>>> William
>>>
>>>
>>>
>>
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message