incubator-jena-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andy Seaborne (JIRA)" <>
Subject [jira] [Commented] (JENA-69) Provide TSVInput reader
Date Mon, 30 May 2011 16:17:47 GMT


Andy Seaborne commented on JENA-69:

I tried to apply patch JENA-69-r1128283.patch and came across a couple of problematic areas:

1/ It changes the TSV format by adding an extra line at the start with the number of variables.

This means the line containing the variable names is line 2 and the data starts line 3.
The TSV format is defined to have the column names on line 1.  ARQ and 4Store currently generate
compatible formats.


Is the count needed?  Coudl the app read in the first line, parser out the variables and use
that as the count?

2/ There is ResultSetStream for wrapping an iterator<Binding> of rows to get a ResultSet.

I think it would be better for TSVInput to wrap it's iterator of Bindings with ResultSetStream,
which only promises a single pass over the results and avoid materializing the intermediate
results.  It would even be possible for TSVInput to create an iterator and not full materialize
the results as it currently does.  This would help stream processing and scalability.

> Provide TSVInput reader
> -----------------------
>                 Key: JENA-69
>                 URL:
>             Project: Jena
>          Issue Type: New Feature
>          Components: ARQ, TDB
>            Reporter: Laurent Pellegrino
>            Priority: Blocker
>              Labels: arq, resultset, tdb, tsv, tsvinput
>         Attachments: JENA-69-r1128283.patch, removenodeclib-r1128186-version1.patch,
tsvinput-r1128173-version1.patch, tsvinput-r1128173-version2.patch
> As stated into the mailing-list it is possible to serialize a ResultSet by using the
TSV format. However, it is not possible to unserialize it (there is no TSVInput implementation).

This message is automatically generated by JIRA.
For more information on JIRA, see:

View raw message