jena-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From a...@apache.org
Subject svn commit: r1629331 - in /jena/site/trunk/content/documentation/csv: design.mdtext get_started.mdtext implementation.mdtext index.mdtext jena-csv-architecture.png
Date Fri, 03 Oct 2014 22:13:48 GMT
Author: andy
Date: Fri Oct  3 22:13:48 2014
New Revision: 1629331

URL: http://svn.apache.org/r1629331
Log:
svn -> git

Modified:
    jena/site/trunk/content/documentation/csv/design.mdtext
    jena/site/trunk/content/documentation/csv/get_started.mdtext
    jena/site/trunk/content/documentation/csv/implementation.mdtext
    jena/site/trunk/content/documentation/csv/index.mdtext
    jena/site/trunk/content/documentation/csv/jena-csv-architecture.png

Modified: jena/site/trunk/content/documentation/csv/design.mdtext
URL: http://svn.apache.org/viewvc/jena/site/trunk/content/documentation/csv/design.mdtext?rev=1629331&r1=1629330&r2=1629331&view=diff
==============================================================================
--- jena/site/trunk/content/documentation/csv/design.mdtext (original)
+++ jena/site/trunk/content/documentation/csv/design.mdtext Fri Oct  3 22:13:48 2014
@@ -1,85 +1,86 @@
-Title: CSV PropertyTable - Design
-
-## Architecture
-
-The architecture of CSV PropertyTable mainly involves 2 components:
-
--    [PropertyTable](https://svn.apache.org/repos/asf/jena/Experimental/jena-csv/src/main/java/org/apache/jena/propertytable/PropertyTable.java)
--    [GraphPropertyTable](https://svn.apache.org/repos/asf/jena/Experimental/jena-csv/src/main/java/org/apache/jena/propertytable/impl/GraphPropertyTable.java)
-
-![Picture of architecture of jena-csv](jena-csv-architecture.png "Architecture of jena-csv")
-
-## PropertyTable 
-
-A `PropertyTable` is collection of data that is sufficiently regular in shape it can be treated
as a table.
-That means each subject has a value for each one of the set of properties.
-Irregularity in terms of missing values needs to be handled but not multiple values for the
same property.
-With special storage, a PropertyTable
-
--    is more compact and more amenable to custom storage (e.g. a JSON document store)
--    can have custom indexes on specific columns
--    can guarantee access orders
-
-More explicitly, `PropertyTable` is designed to be a table of RDF terms, or [Nodes](https://svn.apache.org/repos/asf/jena/trunk/jena-core/src/main/java/com/hp/hpl/jena/graph/Node.java)
in Jena. 
-Each [Column](https://svn.apache.org/repos/asf/jena/Experimental/jena-csv/src/main/java/org/apache/jena/propertytable/Column.java)
of the `PropertyTable` has an unique columnKey `Node` of the predicate (or p for short).
-Each [Row](https://svn.apache.org/repos/asf/jena/Experimental/jena-csv/src/main/java/org/apache/jena/propertytable/Row.java)
of the `PropertyTable` has an unique rowKey `Node` of the subject (or s for short).
-You can use `getColumn()` to get the `Column` by its columnKey `Node` of the predicate, while
`getRow()` for `Row`.
-
-A `PropertyTable` should be constructed in this workflow (in order):
-
-1.    Create `Columns` using `PropertyTable.createColumn()` for each `Column` of the `PropertyTable`
-2.    Create `Rows` using `PropertyTable.createRow()` for each `Row` of the `PropertyTable`
-3.    For each `Row` created, set a value (`Node`) at the specified `Column`, by calling
`Row.setValue()`
-
-Once a `PropertyTable` is built, tabular data within can be accessed by the API of `PropertyTable.getMatchingRows()`,
`PropertyTable.getColumnValues()`, etc.
-
-## GraphPropertyTable
-
-`GraphPropertyTable` implements the [Graph](https://svn.apache.org/repos/asf/jena/trunk/jena-core/src/main/java/com/hp/hpl/jena/graph/Graph.java)
interface (read-only) over a `PropertyTable`. 
-This is subclass from [GraphBase](https://svn.apache.org/repos/asf/jena/trunk/jena-core/src/main/java/com/hp/hpl/jena/graph/impl/GraphBase.java)
and implements `find()`. 
-The `graphBaseFind()`(for matching a `Triple`) and `propertyTableBaseFind()`(for matching
a whole `Row`) methods can choose the access route based on the find arguments.
-`GraphPropertyTable` holds/wraps an reference of the `PropertyTable` instance, so that such
a `Graph` can be treated in a more table-like fashion.
-
-**Note:** Both `PropertyTable` and `GraphPropertyTable` are *NOT* restricted to CSV data.
-They are supposed to be compatible with any table-like data sources, such as relational databases,
Microsoft Excel, etc.
-
-## GraphCSV
-
-[GraphCSV](https://svn.apache.org/repos/asf/jena/Experimental/jena-csv/src/main/java/org/apache/jena/propertytable/impl/GraphCSV.java)
is a sub class of GraphPropertyTable aiming at CSV data.
-Its constructor takes a CSV file path as the parameter, parse the file using a CSV Parser,
and makes a `PropertyTable` through `PropertyTableBuilder`.
-
-For CSV to RDF mapping, we establish some basic principles:
-
-### Single-Value and Regular-Shaped CSV Only
-
-In the [CSV-WG](https://www.w3.org/2013/csvw/wiki/Main_Page), it looks like duplicate column
names are not going to be supported. Therefore, we just consider parsing single-valued CSV
tables. 
-There is the current editor working [draft](http://w3c.github.io/csvw/syntax/) from the CSV
on the Web Working Group, which is defining a more regular data out of CSV.
-This is the target for the CSV work of GraphCSV: tabular regular-shaped CSV; not arbitrary,
irregularly shaped CSV.
-
-### No Additional CSV Metadata
-
-A CSV file with no additional metadata is directly mapped to RDF, which makes a simpler case
compared to SQL-to-RDF work. 
-It's not necessary to have a defined primary column, similar to the primary key of database.
The subject of the triple can be generated through one of:
-
-1.    The triples for each row have a blank node for the subject, e.g. something like the
illustration
-2.    The triples for row N have a subject URI which is `<FILE#_N>`.
-
-### Data Type for Typed Literal
-
-All the values in CSV are parsed as strings line by line. As a better option for the user
to turn on, a dynamic choice which is a posh way of saying attempt to parse it as an integer
(or decimal, double, date) and if it passes, it's an integer (or decimal, double, date).
-Note that for the current release, all of the numbers are parsed as `double`, and `date`
is not supported yet.
-
-### File Path as Namespace
-
-RDF requires that the subjects and the predicates are URIs. We need to pass in the namespaces
(or just the default namespaces) to make URIs by combining the namespaces with the values
in CSV.
-We don’t have metadata of the namespaces for the columns, But subjects can be blank
nodes which is useful because each row is then a new blank node. For predicates, suppose the
URL of the CSV file is `file:///c:/town.csv`, then the columns can be `<file:///c:/town.csv#Town>`
and `<file:///c:/town.csv#Population>`, as is showed in the illustration.
-
-### First Line of Table Header Needed as Predicates
-
-The first line of the CSV file must be the table header. The columns of the first line are
parsed as the predicates of the RDF triples. The RDF triple data are parsed starting from
the second line.
-
-### UTF-8 Encoded Only
-
-The CSV files must be UTF-8 encoded. If your CSV files are using Western European encodings,
please change the encoding before using CSV PropertyTable.
-
-
+Title: CSV PropertyTable - Design
+
+## Architecture
+
+The architecture of CSV PropertyTable mainly involves 2 components:
+
+-    [PropertyTable](https://github.com/apache/jena/tree/master/jena-csv/src/main/java/org/apache/jena/propertytable/PropertyTable.java)
+-    [GraphPropertyTable](https://github.com/apache/jena/tree/master/jena-csv/src/main/java/org/apache/jena/propertytable/impl/GraphPropertyTable.java)
+
+![Picture of architecture of jena-csv](jena-csv-architecture.png "Architecture of jena-csv")
+
+## PropertyTable 
+
+A `PropertyTable` is collection of data that is sufficiently regular in shape it can be treated
as a table.
+That means each subject has a value for each one of the set of properties.
+Irregularity in terms of missing values needs to be handled but not multiple values for the
same property.
+With special storage, a PropertyTable
+
+-    is more compact and more amenable to custom storage (e.g. a JSON document store)
+-    can have custom indexes on specific columns
+-    can guarantee access orders
+
+More explicitly, `PropertyTable` is designed to be a table of RDF terms, or 
+[Nodes](https://github.com/apache/jena/tree/master/jena-core/src/main/java/com/hp/hpl/jena/graph/Node.java)
in Jena. 
+Each [Column](https://github.com/apache/jena/tree/master/jena-csv/src/main/java/org/apache/jena/propertytable/Column.java)
of the `PropertyTable` has an unique columnKey `Node` of the predicate (or p for short).
+Each [Row](https://github.com/apache/jena/tree/master/jena-csv/src/main/java/org/apache/jena/propertytable/Row.java)
of the `PropertyTable` has an unique rowKey `Node` of the subject (or s for short).
+You can use `getColumn()` to get the `Column` by its columnKey `Node` of the predicate, while
`getRow()` for `Row`.
+
+A `PropertyTable` should be constructed in this workflow (in order):
+
+1.    Create `Columns` using `PropertyTable.createColumn()` for each `Column` of the `PropertyTable`
+2.    Create `Rows` using `PropertyTable.createRow()` for each `Row` of the `PropertyTable`
+3.    For each `Row` created, set a value (`Node`) at the specified `Column`, by calling
`Row.setValue()`
+
+Once a `PropertyTable` is built, tabular data within can be accessed by the API of `PropertyTable.getMatchingRows()`,
`PropertyTable.getColumnValues()`, etc.
+
+## GraphPropertyTable
+
+`GraphPropertyTable` implements the [Graph](https://github.com/apache/jena/tree/master/jena-core/src/main/java/com/hp/hpl/jena/graph/Graph.java)
interface (read-only) over a `PropertyTable`. 
+This is subclass from [GraphBase](https://github.com/apache/jena/tree/master/jena-core/src/main/java/com/hp/hpl/jena/graph/impl/GraphBase.java)
and implements `find()`. 
+The `graphBaseFind()`(for matching a `Triple`) and `propertyTableBaseFind()`(for matching
a whole `Row`) methods can choose the access route based on the find arguments.
+`GraphPropertyTable` holds/wraps an reference of the `PropertyTable` instance, so that such
a `Graph` can be treated in a more table-like fashion.
+
+**Note:** Both `PropertyTable` and `GraphPropertyTable` are *NOT* restricted to CSV data.
+They are supposed to be compatible with any table-like data sources, such as relational databases,
Microsoft Excel, etc.
+
+## GraphCSV
+
+[GraphCSV](https://github.com/apache/jena/tree/master/jena-csv/src/main/java/org/apache/jena/propertytable/impl/GraphCSV.java)
is a sub class of GraphPropertyTable aiming at CSV data.
+Its constructor takes a CSV file path as the parameter, parse the file using a CSV Parser,
and makes a `PropertyTable` through `PropertyTableBuilder`.
+
+For CSV to RDF mapping, we establish some basic principles:
+
+### Single-Value and Regular-Shaped CSV Only
+
+In the [CSV-WG](https://www.w3.org/2013/csvw/wiki/Main_Page), it looks like duplicate column
names are not going to be supported. Therefore, we just consider parsing single-valued CSV
tables. 
+There is the current editor working [draft](http://w3c.github.io/csvw/syntax/) from the CSV
on the Web Working Group, which is defining a more regular data out of CSV.
+This is the target for the CSV work of GraphCSV: tabular regular-shaped CSV; not arbitrary,
irregularly shaped CSV.
+
+### No Additional CSV Metadata
+
+A CSV file with no additional metadata is directly mapped to RDF, which makes a simpler case
compared to SQL-to-RDF work. 
+It's not necessary to have a defined primary column, similar to the primary key of database.
The subject of the triple can be generated through one of:
+
+1.    The triples for each row have a blank node for the subject, e.g. something like the
illustration
+2.    The triples for row N have a subject URI which is `<FILE#_N>`.
+
+### Data Type for Typed Literal
+
+All the values in CSV are parsed as strings line by line. As a better option for the user
to turn on, a dynamic choice which is a posh way of saying attempt to parse it as an integer
(or decimal, double, date) and if it passes, it's an integer (or decimal, double, date).
+Note that for the current release, all of the numbers are parsed as `double`, and `date`
is not supported yet.
+
+### File Path as Namespace
+
+RDF requires that the subjects and the predicates are URIs. We need to pass in the namespaces
(or just the default namespaces) to make URIs by combining the namespaces with the values
in CSV.
+We don’t have metadata of the namespaces for the columns, But subjects can be blank
nodes which is useful because each row is then a new blank node. For predicates, suppose the
URL of the CSV file is `file:///c:/town.csv`, then the columns can be `<file:///c:/town.csv#Town>`
and `<file:///c:/town.csv#Population>`, as is showed in the illustration.
+
+### First Line of Table Header Needed as Predicates
+
+The first line of the CSV file must be the table header. The columns of the first line are
parsed as the predicates of the RDF triples. The RDF triple data are parsed starting from
the second line.
+
+### UTF-8 Encoded Only
+
+The CSV files must be UTF-8 encoded. If your CSV files are using Western European encodings,
please change the encoding before using CSV PropertyTable.
+
+

Modified: jena/site/trunk/content/documentation/csv/get_started.mdtext
URL: http://svn.apache.org/viewvc/jena/site/trunk/content/documentation/csv/get_started.mdtext?rev=1629331&r1=1629330&r2=1629331&view=diff
==============================================================================
--- jena/site/trunk/content/documentation/csv/get_started.mdtext (original)
+++ jena/site/trunk/content/documentation/csv/get_started.mdtext Fri Oct  3 22:13:48 2014
@@ -1,81 +1,84 @@
-Title: CSV PropertyTable - Get Started
-
-## Using CSV PropertyTable with Apache Maven
-
-See ["Using Jena with Apache Maven"](http://jena.apache.org/download/maven.html) for full
details.
-
-    <dependency>
-       <groupId>org.apache.jena</groupId>
-       <artifactId>jena-csv</artifactId>
-       <version>X.Y.Z</version>
-    </dependency>
-
-## Using CSV PropertyTable from Java through the API
-
-In order to switch on CSV PropertyTable, it's required to register `LangCSV` into [Jena RIOT](http://jena.apache.org/documentation/io/),
through a simple method call:
-
-	import org.apache.jena.propertytable.lang.LangCSV;
-	... 
-    LangCSV.register(); 
-
-It's a static method call of registration, which needs to be run just one time for an application
before using CSV PropertyTable (e.g. during the initialization phase).
-
-Once registered, CSV PropertyTable provides 2 ways for the users to play with (i.e. GraphCSV
and RIOT):
-
-### GraphCSV
-
-[GraphCSV](https://svn.apache.org/repos/asf/jena/Experimental/jena-csv/src/main/java/org/apache/jena/propertytable/graph/GraphCSV.java)
wrappers a CSV file as a Graph, which makes a Model for SPARQL query:
-
-    Model model = ModelFactory.createModelForGraph(new GraphCSV("data.csv")) ;
-    QueryExecution qExec = QueryExecutionFactory.create(query, model) ;
-
-or for multiple CSV files and/or other RDF data:
-    
-    Model csv1 = ModelFactory.createModelForGraph(new GraphCSV("data1.csv")) ;
-    Model csv2 = ModelFactory.createModelForGraph(new GraphCSV("data2.csv")) ;
-    Model other = ModelFactory.createModelForGraph(otherGraph) ;
-    Dataset dataset = ... ;
-    dataset.addNamedModel("http://example/table1", csv1) ;
-    dataset.addNamedModel("http://example/table2", csv2) ;
-    dataset.addNamedModel("http://example/other", other) ;
-    ... normal SPARQL execution ...
-
-You can also find the full examples from [GraphCSVTest](https://svn.apache.org/repos/asf/jena/Experimental/jena-csv/src/test/java/org/apache/jena/propertytable/graph/GraphCSVTest.java).
-
-In short, for Jena ARQ, a CSV table is actually a Graph (i.e. GraphCSV), without any differences
from other types of Graphs when using it from the Jena ARQ API.
-
-### RIOT
-
-When LangCSV is registered into RIOT, CSV PropertyTable adds a new RDF syntax of '.csv' with
the content type of "text/csv".
-You can read ".csv" files into Model following the standard RIOT usages:
-
-    // Usage 1: Direct reading through Model
-    Model model_1 = ModelFactory.createDefaultModel()
-    model.read("test.csv") ;
-    
-    // Usage 2: Reading using RDFDataMgr
-    Model model_2 = RDFDataMgr.loadModel("test.csv") ;
-
-For more information, see [Reading RDF in Apache Jena](http://jena.apache.org/documentation/io/rdf-input.html).
-
-Note that, the requirements for the CSV files are listed in the documentation of [Design](design.html).
CSV PropertyTable only supports **single-Value**, **regular-Shaped**, **table-headed** and
**UTF-8-encoded** CSV files (**NOT** Microsoft Excel files).
-
-## Command Line Tool
-
-[csv2rdf](https://svn.apache.org/repos/asf/jena/Experimental/jena-csv/src/main/java/riotcmd/csv2rdf.java)
is a tool for direct transforming from CSV to the formatted RDF syntax of N-Triples.
-The script calls the `csv2rdf` java program in the `riotcmd` package in this way:
-
-    java -cp ... riotcmd.csv2rdf -dest=outputFile inputFile ...
-
-It transforms the CSV `inputFile` into N-Triples `outputFile`. For example,
-
-    java -cp ... riotcmd.csv2rdf --dest=test.ntriples src/test/resources/test.csv
-
-The script reuses [Common framework for running RIOT parsers](https://svn.apache.org/repos/asf/jena/trunk/jena-arq/src/main/java/riotcmd/CmdLangParse.java),
so that it also accepts the same arguments (type `"riot --help"` to get command line reminders)
from [RIOT Command line tools](https://jena.apache.org/documentation/io/#command-line-tools):
-
--   `--validate`: Checking mode: same as `--strict --sink --check=true`
--   `--check=true/false`: Run with checking of literals and IRIs either on or off.
--   `--sink`: No output of triples or quads in the standard output (i.e. `System.out`).
--   `--time`: Output timing information.
-
-
+Title: CSV PropertyTable - Get Started
+
+## Using CSV PropertyTable with Apache Maven
+
+See ["Using Jena with Apache Maven"](http://jena.apache.org/download/maven.html) for full
details.
+
+    <dependency>
+       <groupId>org.apache.jena</groupId>
+       <artifactId>jena-csv</artifactId>
+       <version>X.Y.Z</version>
+    </dependency>
+
+## Using CSV PropertyTable from Java through the API
+
+In order to switch on CSV PropertyTable, it's required to register `LangCSV` into [Jena RIOT](http://jena.apache.org/documentation/io/),
through a simple method call:
+
+	import org.apache.jena.propertytable.lang.CSV2RDF;
+	... 
+        CSV2RDF.init() ;
+
+It's a static method call of registration, which needs to be run just one time for an application
before using CSV PropertyTable (e.g. during the initialization phase).
+
+Once registered, CSV PropertyTable provides 2 ways for the users to play with (i.e. GraphCSV
and RIOT):
+
+### GraphCSV
+
+[GraphCSV](https://github.com/apache/jena/tree/master/jena-csv/src/main/java/org/apache/jena/propertytable/graph/GraphCSV.java)
wrappers a CSV file as a Graph, which makes a Model for SPARQL query:
+
+    Model model = ModelFactory.createModelForGraph(new GraphCSV("data.csv")) ;
+    QueryExecution qExec = QueryExecutionFactory.create(query, model) ;
+
+or for multiple CSV files and/or other RDF data:
+    
+    Model csv1 = ModelFactory.createModelForGraph(new GraphCSV("data1.csv")) ;
+    Model csv2 = ModelFactory.createModelForGraph(new GraphCSV("data2.csv")) ;
+    Model other = ModelFactory.createModelForGraph(otherGraph) ;
+    Dataset dataset = ... ;
+    dataset.addNamedModel("http://example/table1", csv1) ;
+    dataset.addNamedModel("http://example/table2", csv2) ;
+    dataset.addNamedModel("http://example/other", other) ;
+    ... normal SPARQL execution ...
+
+You can also find the full examples from [GraphCSVTest](https://github.com/apache/jena/tree/master/jena-csv/src/test/java/org/apache/jena/propertytable/graph/GraphCSVTest.java).
+
+In short, for Jena ARQ, a CSV table is actually a Graph (i.e. GraphCSV), without any differences
from other types of Graphs when using it from the Jena ARQ API.
+
+### RIOT
+
+When LangCSV is registered into RIOT, CSV PropertyTable adds a new RDF syntax of '.csv' with
the content type of "text/csv".
+You can read ".csv" files into Model following the standard RIOT usages:
+
+    // Usage 1: Direct reading through Model
+    Model model_1 = ModelFactory.createDefaultModel()
+    model.read("test.csv") ;
+    
+    // Usage 2: Reading using RDFDataMgr
+    Model model_2 = RDFDataMgr.loadModel("test.csv") ;
+
+For more information, see [Reading RDF in Apache Jena](http://jena.apache.org/documentation/io/rdf-input.html).
+
+Note that, the requirements for the CSV files are listed in the documentation of [Design](design.html).
CSV PropertyTable only supports **single-Value**, **regular-Shaped**, **table-headed** and
**UTF-8-encoded** CSV files (**NOT** Microsoft Excel files).
+
+## Command Line Tool
+
+[csv2rdf](https://github.com/apache/jena/tree/master/jena-csv/src/main/java/riotcmd/csv2rdf.java)
is a tool for direct transforming from CSV to the formatted RDF syntax of N-Triples.
+The script calls the `csv2rdf` java program in the `riotcmd` package in this way:
+
+    java -cp ... riotcmdx.csv2rdf inputFile ...
+
+It transforms the CSV `inputFile` into N-Triples. For example,
+
+    java -cp ... riotcmdx.csv2rdf src/test/resources/test.csv
+
+The script reuses [Common framework for running RIOT parsers](../io/index.html),
+so that it also accepts the same arguments
+(type `"riot --help"` to get command line reminders) from 
+[RIOT Command line tools](https://jena.apache.org/documentation/io/#command-line-tools):
+
+-   `--validate`: Checking mode: same as `--strict --sink --check=true`
+-   `--check=true/false`: Run with checking of literals and IRIs either on or off.
+-   `--sink`: No output of triples or quads in the standard output (i.e. `System.out`).
+-   `--time`: Output timing information.
+
+

Modified: jena/site/trunk/content/documentation/csv/implementation.mdtext
URL: http://svn.apache.org/viewvc/jena/site/trunk/content/documentation/csv/implementation.mdtext?rev=1629331&r1=1629330&r2=1629331&view=diff
==============================================================================
--- jena/site/trunk/content/documentation/csv/implementation.mdtext (original)
+++ jena/site/trunk/content/documentation/csv/implementation.mdtext Fri Oct  3 22:13:48 2014
@@ -1,41 +1,41 @@
-Title: CSV PropertyTable - Implementation
-
-## PropertyTable Implementations
-
-There're 2 implementations for `PropertyTable`. The pros and cons are summarised in the following
table: 
-
-PropertyTable Implementation | Description | Supported Indexes | Advantages | Disadvantages
 
----------------------------- | ----------- | ----------------- | ---------- | -------------

-`PropertyTableArrayImpl` | implemented by a two-dimensioned Java array of `Nodes`| SPO, PSO
| compact memory usage, fast for querying with S and P, fast for query a whole `Row` | slow
for query with O, table Row/Column size provided |
-`PropertyTableHashMapImpl` | implemented by several Java `HashMaps` | PSO, POS | fast for
querying with O, table Row/Column size not required | more memory usage for HashMaps |
-
-By default, [PropertyTableArrayImpl](https://svn.apache.org/repos/asf/jena/Experimental/jena-csv/src/main/java/org/apache/jena/propertytable/impl/PropertyTableArrayImpl.java)
is used as the `PropertyTable` implementation held by `GraphCSV`.
-If you want to switch to [PropertyTableHashMapImpl](https://svn.apache.org/repos/asf/jena/Experimental/jena-csv/src/main/java/org/apache/jena/propertytable/impl/PropertyTableHashMapImpl.java),
just use the static method of `GraphCSV.createHashMapImpl()` to replace the default `new GraphCSV()`
way.
-Here is an example:
-
-    Model model_csv_array_impl = ModelFactory.createModelForGraph(new GraphCSV(file)); //
PropertyTableArrayImpl
-    Model model_csv_hashmap_impl = ModelFactory.createModelForGraph(GraphCSV.createHashMapImpl(file));
// PropertyTableHashMapImpl
-
-## StageGenerator Optimization for GraphPropertyTable
-
-Accessing from SPARQL via `Graph.find()` will work, but it's not ideal. Some optimizations
can be done for processing a SPARQL basic graph pattern. More explicitly, in the method of
`OpExecutor.execute(OpBGP, ...)`, when the target for the query is a `GraphPropertyTable`,
it can get a whole `Row`, or `Rows`, of the table data and match the pattern with the bindings.
-
-The optimization of querying a whole `Row` in the PropertyTable are supported now.
-The following query pattern can be transformed into a `Row` querying, without generating
triples:
-
-    ?x :prop1 ?v .
-    ?x :prop2 ?w .
-    ...
-
-It's made by using the extension point of `StageGenerator`, because it's now just concerned
with `BasicPattern`.
-The detailed workflow goes in this way:
-
-1.    Split the incoming `BasicPattern` by subjects, (i.e. it becomes multiple sub BasicPatterns
grouped by the same subjects. (see [QueryIterPropertyTable](https://svn.apache.org/repos/asf/jena/Experimental/jena-csv/src/main/java/org/apache/jena/propertytable/impl/QueryIterPropertyTable.java)
)
-2.    For each sub `BasicPattern`, if the `Triple` size within is greater than 1 (i.e. at
least 2 `Triples`), it's turned into a `Row` querying, and processed by [QueryIterPropertyTableRow](https://svn.apache.org/repos/asf/jena/Experimental/jena-csv/src/main/java/org/apache/jena/propertytable/impl/QueryIterPropertyTableRow.java),
else if it contains only 1 `Triple`, it goes for the traditional `Triple` querying by `graph.graphBaseFind()`
-
-In order to turn on this optimization, we need to register the [StageGeneratorPropertyTable](https://svn.apache.org/repos/asf/jena/Experimental/jena-csv/src/main/java/org/apache/jena/propertytable/impl/StageGeneratorPropertyTable.java)
into ARQ context, before performing SPARQL querying:
-
-    StageGenerator orig = (StageGenerator)ARQ.getContext().get(ARQ.stageGenerator) ;
-    StageGenerator stageGenerator = new StageGeneratorPropertyTable(orig) ;
-    StageBuilder.setGenerator(ARQ.getContext(), stageGenerator) ;
-
+Title: CSV PropertyTable - Implementation
+
+## PropertyTable Implementations
+
+There're 2 implementations for `PropertyTable`. The pros and cons are summarised in the following
table: 
+
+PropertyTable Implementation | Description | Supported Indexes | Advantages | Disadvantages
 
+---------------------------- | ----------- | ----------------- | ---------- | -------------

+`PropertyTableArrayImpl` | implemented by a two-dimensioned Java array of `Nodes`| SPO, PSO
| compact memory usage, fast for querying with S and P, fast for query a whole `Row` | slow
for query with O, table Row/Column size provided |
+`PropertyTableHashMapImpl` | implemented by several Java `HashMaps` | PSO, POS | fast for
querying with O, table Row/Column size not required | more memory usage for HashMaps |
+
+By default, [PropertyTableArrayImpl]((https://github.com/apache/jena/tree/master/jena-csv/src/main/java/org/apache/jena/propertytable/impl/PropertyTableArrayImpl.java)
is used as the `PropertyTable` implementation held by `GraphCSV`.
+If you want to switch to [PropertyTableHashMapImpl](https://github.com/apache/jena/tree/master/jena-csv/src/main/java/org/apache/jena/propertytable/impl/PropertyTableHashMapImpl.java),
just use the static method of `GraphCSV.createHashMapImpl()` to replace the default `new GraphCSV()`
way.
+Here is an example:
+
+    Model model_csv_array_impl = ModelFactory.createModelForGraph(new GraphCSV(file)); //
PropertyTableArrayImpl
+    Model model_csv_hashmap_impl = ModelFactory.createModelForGraph(GraphCSV.createHashMapImpl(file));
// PropertyTableHashMapImpl
+
+## StageGenerator Optimization for GraphPropertyTable
+
+Accessing from SPARQL via `Graph.find()` will work, but it's not ideal. Some optimizations
can be done for processing a SPARQL basic graph pattern. More explicitly, in the method of
`OpExecutor.execute(OpBGP, ...)`, when the target for the query is a `GraphPropertyTable`,
it can get a whole `Row`, or `Rows`, of the table data and match the pattern with the bindings.
+
+The optimization of querying a whole `Row` in the PropertyTable are supported now.
+The following query pattern can be transformed into a `Row` querying, without generating
triples:
+
+    ?x :prop1 ?v .
+    ?x :prop2 ?w .
+    ...
+
+It's made by using the extension point of `StageGenerator`, because it's now just concerned
with `BasicPattern`.
+The detailed workflow goes in this way:
+
+1.    Split the incoming `BasicPattern` by subjects, (i.e. it becomes multiple sub BasicPatterns
grouped by the same subjects. (see [QueryIterPropertyTable](https://github.com/apache/jena/tree/master/jena-csv/src/main/java/org/apache/jena/propertytable/impl/QueryIterPropertyTable.java)
)
+2.    For each sub `BasicPattern`, if the `Triple` size within is greater than 1 (i.e. at
least 2 `Triples`), it's turned into a `Row` querying, and processed by [QueryIterPropertyTableRow](https://github.com/apache/jena/tree/master/jena-csv/src/main/java/org/apache/jena/propertytable/impl/QueryIterPropertyTableRow.java),
else if it contains only 1 `Triple`, it goes for the traditional `Triple` querying by `graph.graphBaseFind()`
+
+In order to turn on this optimization, we need to register the [StageGeneratorPropertyTable](https://github.com/apache/jena/tree/master/jena-csv/src/main/java/org/apache/jena/propertytable/impl/StageGeneratorPropertyTable.java)
into ARQ context, before performing SPARQL querying:
+
+    StageGenerator orig = (StageGenerator)ARQ.getContext().get(ARQ.stageGenerator) ;
+    StageGenerator stageGenerator = new StageGeneratorPropertyTable(orig) ;
+    StageBuilder.setGenerator(ARQ.getContext(), stageGenerator) ;
+

Modified: jena/site/trunk/content/documentation/csv/index.mdtext
URL: http://svn.apache.org/viewvc/jena/site/trunk/content/documentation/csv/index.mdtext?rev=1629331&r1=1629330&r2=1629331&view=diff
==============================================================================
--- jena/site/trunk/content/documentation/csv/index.mdtext (original)
+++ jena/site/trunk/content/documentation/csv/index.mdtext Fri Oct  3 22:13:48 2014
@@ -1,60 +1,60 @@
-Title: CSV PropertyTable
-
-This module is about getting CSVs into a form that is amenable to Jena SPARQL processing,
and doing so in a way that is not specific to CSV files. 
-It includes getting the right architecture in place for regular table shaped data, using
the core abstraction of PropertyTable.
-
-*Illustration*
-
-This module involves the basic mapping of CSV to RDF using a fixed algorithm, including interpreting
data as numbers or strings.
-
-Suppose we have a CSV file located in “file:///c:/town.csv”, which has one header
row, two data rows:
-
-    Town,Population
-    Southton,123000
-    Northville,654000
- 
-As RDF this might be viewable as:
- 
-    @prefix : <file:///c:/town.csv#> .
-    @prefix csv: <http://w3c/future-csv-vocab/> .
-    [ csv:row 1 ; :Town "Southton" ; :Population  “123000”^^http://www.w3.org/2001/XMLSchema#int
] .
-    [ csv:row 2 ; :Town "Northville" ; :Population  “654000”^^http://www.w3.org/2001/XMLSchema#int
 ] .
- 
-or without the bnode abbreviation:
- 
-    @prefix : <file:///c:/town.csv#> .
-    @prefix csv: <http://w3c/future-csv-vocab/> .
-    _:b0  csv:row 1 ;
-          :Town "Southton" ;
-          :Population “123000”^^http://www.w3.org/2001/XMLSchema#int .
-    _:b1  csv:row 2 ;
-          :Town "Northville" ;
-          :Population “654000”^^http://www.w3.org/2001/XMLSchema#int.
-          
-Each row is modeling one "entity" (here, a population observation). 
-There is a subject (a blank node) and one predicate-value for each cell of the row. 
-Row numbers are added because it can be important. 
-Now the CSV file is viewed as a graph - normal, unmodified SPARQL can be used. 
-Multiple CSVs files can be multiple graphs in one dataset to give query across different
data sources.
- 
-We can use the following SPARQL query for “Towns over 500,000 people” mentioned
in the CSV file:
- 
-    SELECT ?townName ?pop {
-      GRAPH <file:///c:/town.csv> {
-        ?x :Town ?townName ;
-           :Popuation ?pop .
-        FILTER(?pop > 500000)
-      }
-    }
-
-What's more, we make some room for future extension through `PropertyTable`.
-The [architecture](design.html) is designed to be able to accommodate any table-like data
sources, such as relational databases, Microsoft Excel, etc.
-
-## Documentation
-
--   [Get Started](get_started.html)
--   [Design](design.html)
--   [Implementation](implementation.html)
-
-
-
+Title: CSV PropertyTable
+
+This module is about getting CSVs into a form that is amenable to Jena SPARQL processing,
and doing so in a way that is not specific to CSV files. 
+It includes getting the right architecture in place for regular table shaped data, using
the core abstraction of PropertyTable.
+
+*Illustration*
+
+This module involves the basic mapping of CSV to RDF using a fixed algorithm, including interpreting
data as numbers or strings.
+
+Suppose we have a CSV file located in “file:///c:/town.csv”, which has one header
row, two data rows:
+
+    Town,Population
+    Southton,123000
+    Northville,654000
+ 
+As RDF this might be viewable as:
+ 
+    @prefix : <file:///c:/town.csv#> .
+    @prefix csv: <http://w3c/future-csv-vocab/> .
+    [ csv:row 1 ; :Town "Southton" ; :Population  “123000”^^http://www.w3.org/2001/XMLSchema#int
] .
+    [ csv:row 2 ; :Town "Northville" ; :Population  “654000”^^http://www.w3.org/2001/XMLSchema#int
 ] .
+ 
+or without the bnode abbreviation:
+ 
+    @prefix : <file:///c:/town.csv#> .
+    @prefix csv: <http://w3c/future-csv-vocab/> .
+    _:b0  csv:row 1 ;
+          :Town "Southton" ;
+          :Population “123000”^^http://www.w3.org/2001/XMLSchema#int .
+    _:b1  csv:row 2 ;
+          :Town "Northville" ;
+          :Population “654000”^^http://www.w3.org/2001/XMLSchema#int.
+          
+Each row is modeling one "entity" (here, a population observation). 
+There is a subject (a blank node) and one predicate-value for each cell of the row. 
+Row numbers are added because it can be important. 
+Now the CSV file is viewed as a graph - normal, unmodified SPARQL can be used. 
+Multiple CSVs files can be multiple graphs in one dataset to give query across different
data sources.
+ 
+We can use the following SPARQL query for “Towns over 500,000 people” mentioned
in the CSV file:
+ 
+    SELECT ?townName ?pop {
+      GRAPH <file:///c:/town.csv> {
+        ?x :Town ?townName ;
+           :Popuation ?pop .
+        FILTER(?pop > 500000)
+      }
+    }
+
+What's more, we make some room for future extension through `PropertyTable`.
+The [architecture](design.html) is designed to be able to accommodate any table-like data
sources, such as relational databases, Microsoft Excel, etc.
+
+## Documentation
+
+-   [Get Started](get_started.html)
+-   [Design](design.html)
+-   [Implementation](implementation.html)
+
+
+

Modified: jena/site/trunk/content/documentation/csv/jena-csv-architecture.png
URL: http://svn.apache.org/viewvc/jena/site/trunk/content/documentation/csv/jena-csv-architecture.png?rev=1629331&r1=1629330&r2=1629331&view=diff
==============================================================================
Binary files - no diff available.



Mime
View raw message