commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Holger Stratmann (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (CSV-131) save positions of records to enable random access
Date Sun, 07 Sep 2014 22:10:28 GMT

     [ https://issues.apache.org/jira/browse/CSV-131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Holger Stratmann updated CSV-131:
---------------------------------
    Description: 
It would be good to have {{CSVRecord}} save its position in the source stream.

Reason: Knowing the position of the records would enable random access to retrieve records
from the source (after reading it once to build an index) if the file is too large to be read
into memory (or if we don't want to read the full file to access a record in the middle).

Additional info: I have created a "random access csv reader" and a "csv viewer" (Swing) for
arbitrarily large CSV files. It requires one additional scan of the file to build an index
(multi-byte charsets supported). The index can be saved to a file so it only needs to be built
once. Because the lexer uses a BufferedReader, we need "internal information" to know where
each record starts.
The change to "core" is minor: one field in {{CSVRecord}}s and some associated methods to
store the position.
Patch will be attached.
Code for random access (both UI and non-UI) will be proposed (and possibly submitted) as a
separate issue. It could also be an independent add-on but requires this one little change
to Commons CSV.


  was:
It would be good to have the position of the {{CSVRecord}} save its position in the source
stream.

Reason: Knowing the position of the records would enable random access to retrieve records
from the source (after reading it once to build an index) if the file is too large to be read
into memory (or if we don't want to read the full file to access a record in the middle).

Additional info: I have created a "random access csv reader" and a "csv viewer" (Swing) for
arbitrarily large CSV files. It requires one additional scan of the file to build an index
(multi-byte charsets supported). The index can be saved to a file so it only needs to be built
once. Because the lexer uses a BufferedReader, we need "internal information" to know where
each record starts.
The change to "core" is minor: one field in {{CSVRecord}}s and some associated methods to
store the position.
Patch will be attached.
Code for random access (both UI and non-UI) will be proposed (and possibly submitted) as a
separate issue. It could also be an independent add-on but requires this one little change
to Commons CSV.



> save positions of records to enable random access
> -------------------------------------------------
>
>                 Key: CSV-131
>                 URL: https://issues.apache.org/jira/browse/CSV-131
>             Project: Commons CSV
>          Issue Type: Improvement
>          Components: Parser
>    Affects Versions: 1.1
>            Reporter: Holger Stratmann
>            Priority: Minor
>         Attachments: PositionTracking_20140907.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> It would be good to have {{CSVRecord}} save its position in the source stream.
> Reason: Knowing the position of the records would enable random access to retrieve records
from the source (after reading it once to build an index) if the file is too large to be read
into memory (or if we don't want to read the full file to access a record in the middle).
> Additional info: I have created a "random access csv reader" and a "csv viewer" (Swing)
for arbitrarily large CSV files. It requires one additional scan of the file to build an index
(multi-byte charsets supported). The index can be saved to a file so it only needs to be built
once. Because the lexer uses a BufferedReader, we need "internal information" to know where
each record starts.
> The change to "core" is minor: one field in {{CSVRecord}}s and some associated methods
to store the position.
> Patch will be attached.
> Code for random access (both UI and non-UI) will be proposed (and possibly submitted)
as a separate issue. It could also be an independent add-on but requires this one little change
to Commons CSV.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message