uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jens Grivolla <j+...@grivolla.net>
Subject Re: Working with very large text documents
Date Fri, 18 Oct 2013 09:05:05 GMT
On 10/18/2013 10:06 AM, Armin Wegner wrote:

> What are you doing with very large text documents in an UIMA Pipeline, for example 9
GB in size.

Just out of curiosity, how can you possibly have 9GB of text that 
represent one document? From a quick look at project gutenberg it seems 
that a full book with HTML markup is about 500kB to 1MB, so that's about 
a complete public library full of books.


View raw message