devicemap-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From re...@apache.org
Subject svn commit: r1650661 - /devicemap/branches/2.0/data/README_PATTERNS
Date Fri, 09 Jan 2015 20:53:23 GMT
Author: rezan
Date: Fri Jan  9 20:53:23 2015
New Revision: 1650661

URL: http://svn.apache.org/r1650661
Log:
wiki

Modified:
    devicemap/branches/2.0/data/README_PATTERNS

Modified: devicemap/branches/2.0/data/README_PATTERNS
URL: http://svn.apache.org/viewvc/devicemap/branches/2.0/data/README_PATTERNS?rev=1650661&r1=1650660&r2=1650661&view=diff
==============================================================================
--- devicemap/branches/2.0/data/README_PATTERNS (original)
+++ devicemap/branches/2.0/data/README_PATTERNS Fri Jan  9 20:53:23 2015
@@ -1,3 +1,5 @@
+<<TableOfContents(2)>>
+
 = Pattern Specification 2.0 =
 Draft 1, 2014-01-09
 
@@ -5,7 +7,7 @@ This is the DeviceMap data specification
 
 All encodings in this document are UTF8.
 
-==== Overview ====
+=== Overview ===
 
 This document goes over how the DeviceMap data domains are defined and how the
 classifiers will process user input against the domains.
@@ -18,23 +20,23 @@ The classification process is broken dow
 
 The following definitions are used:
 
-input string::
-::this is the string to be classified
+ input string::
+ :: this is the string to be classified
 
-token stream::
-::this is the list of tokens that result from the Input Parsing phase
+ token stream::
+ :: this is the list of tokens that result from the Input Parsing phase
 
-pattern::
-::this is a complete pattern definition with an id, type, rank, and pattern tokens
+ pattern::
+ :: this is a complete pattern definition with an id, type, rank, and pattern tokens
 
-pattern tokens::
-::these are the individual pattern strings which comprise a pattern
+ pattern tokens::
+ :: these are the individual pattern strings which comprise a pattern
 
-pattern type::
-::this defines how the pattern tokens must appear in the input string for the pattern to
be valid
+ pattern type::
+ :: this defines how the pattern tokens must appear in the input string for the pattern to
be valid
 
-matched tokens::
-::these are pattern tokens which are successfully matched in the token stream
+ matched tokens::
+ :: these are pattern tokens which are successfully matched in the token stream
 
 The pattern and attribute files are JSON objects. These objects will contain:
 
@@ -46,31 +48,31 @@ The pattern and attribute files are JSON
 
 
 
-== Input Parsing ==
+= Input Parsing =
 
 This step parses the input string and creates the token stream.
 
 Each pattern file defines the domain input parsing rules:
 
-input transformers::
-::Type: list of transformation steps
-::Optional. Default: none
-::TODO: define what exactly these can be.
-
-token separators::
-::Type: list of token seperator strings
-::Optional. Default: none
-
-ngram concatenation size::
-::Type: greater than zero integer
-::Optional. Default: 1
+ input transformers::
+ :: Type: list of transformation steps
+ :: Optional. Default: none
+ :: TODO: define what exactly these can be.
+
+ token separators::
+ :: Type: list of token seperator strings
+ :: Optional. Default: none
+
+ ngram concatenation size::
+ :: Type: greater than zero integer
+ :: Optional. Default: 1
 
 The input string first gets processed thru the transformers.
 Then it gets tokenized using the configured seperators. Then ngram
 concatenation happens. The final result of these 3 steps is the token stream.
 
 
-==== Notes ====
+=== Notes ===
 
 Empty tokens are removed from the tokenization step.
 
@@ -82,7 +84,7 @@ If the ngram concatenation size is great
 made first before creating the smaller ngrams.
 
 
-==== Example ====
+=== Example ===
 
 {{{
 Transformer: lowercase, [0-9]+ => _NUM
@@ -100,10 +102,10 @@ Post ngram (token stream): a_NUM, a, _NU
 
 
 
-== Pattern Matching ==
+= Pattern Matching =
 
 This step processes the token stream and picks the highest ranking pattern which
-matches on the stream..
+matches on the stream.
 
 The pattern file defines a set of patterns. Each pattern has 2 main attributes,
 its pattern type and its pattern rank. The pattern
@@ -123,71 +125,59 @@ support the pattern set.
 
 Pattern attributes:
 
-PatternId::
-::Type: String
-::Required.
-
-RankType::
-::Type: string
-::Required.
-
-RankValue::
-::Type: integer
-::Optional. Default: 0.
-::Use defined by RankType.
-
-PatternType::
-::Type: string
-::Required.
-
-PatternTokens::
-::Type: list of pattern token strings
-::Required.
-
-Default::
-::Type: boolean
-::Optional. Default: false.
-::Only 1 pattern can have a true value of false.
+ PatternId::
+ :: Type: String
+ :: Required.
+
+ RankType::
+ :: Type: string
+ :: Required.
+
+ RankValue::
+ :: Type: integer
+ :: Optional. Default: 0.
+ :: Use defined by RankType.
+
+ PatternType::
+ :: Type: string
+ :: Required.
+
+ PatternTokens::
+ :: Type: list of pattern token strings
+ :: Required.
+
+ Default::
+ :: Type: boolean
+ :: Optional. Default: false.
+ :: Only 1 pattern can have a true value of false.
 
 
-==== PatternType ====
+== PatternType ==
 
 The following pattern types are defined:
 
-SimpleOrderedAnd::
-::Each pattern token must appear in the token stream in index orderi, as defined
-in the PatternTokens list. Its okay for non matched tokens to appear inbetween
-matched tokens as long as the matched tokens are still in order.
+ SimpleOrderedAnd::
+ :: Each pattern token must appear in the token stream in index order, as defined in the
PatternTokens list. Its okay for non matched tokens to appear inbetween matched tokens as
long as the matched tokens are still in order.
 
-SimpleAnd::
-::Each pattern token must appear in the token stream. Order does not matter.
+ SimpleAnd::
+ :: Each pattern token must appear in the token stream. Order does not matter.
 
-Simple::
-::Only one pattern must appear in the token stream.
+ Simple::
+ :: Only one pattern must appear in the token stream.
 
 
-==== RankType ====
+== RankType ==
 
 The following rank types are defined:
 
-Strong::
-::Strong patterns are ranked higher than Weak and None. The RankValue
-is ignored and they are ranked by their position
-in the pattern stream. The lower the position, the higher the rank.
-When a Strong pattern is found, the pattern matching step can stop and
-this pattern can be returned without analyzing the rest of the stream.
-This is because its impossible for another pattern to rank higher.
-
-Weak::
-::Weak patterns are ranked below Strong but above None. A Weak pattern can only
-be returned in the absence of a Strong pattern. Weak patterns always rank higher
-than None patterns, regardless of the RankValue. The RankValue is used to rank
-between successfully matched Weak patterns.
-
-None::
-::None patterns are ranked below Strong and Weak. A None pattern can only be
-returned in the absence of successful Strong and Weak patterns. The RankValue
-is used to rank between successfully matched None patterns.
+ Strong::
+ :: Strong patterns are ranked higher than Weak and None. The RankValue is ignored and they
are ranked by their position in the pattern stream. The lower the position, the higher the
rank. When a Strong pattern is found, the pattern matching step can stop and this pattern
can be returned without analyzing the rest of the stream. This is because its impossible for
another pattern to rank higher.
+
+ Weak::
+ :: Weak patterns are ranked below Strong but above None. A Weak pattern can only be returned
in the absence of a Strong pattern. Weak patterns always rank higher than None patterns, regardless
of the RankValue. The RankValue is used to rank between successfully matched Weak patterns.
+
+ None::
+ :: None patterns are ranked below Strong and Weak. A None pattern can only be returned in
the absence of successful Strong and Weak patterns. The RankValue is used to rank between
successfully matched None patterns.
 
 In the case where 2 or more Weak or None patterns have the same RankValue resulting in a
tie,
 the pattern with the longest concatenated matched pattern length is used. If that results
in
@@ -196,7 +186,7 @@ another tie, the pattern found first is
 If no pattern is successfully matched, the default pattern is returned. If no
 default pattern is defined, a null pattern is returned.
 
-==== Notes ====
+=== Notes ===
 
 If 2 or more patterns share the same PatternId, then only 1 of their PatternTypes
 need to match. There is an implied OR between multiple PatternTypes with equal PatternId.
@@ -207,7 +197,7 @@ If more than 1 default is defined, the 1
 found at the same time, the pattern the classifier chooses is undefined.
 
 
-==== Examples ====
+=== Examples ===
 
 {{{
 Pattern:
@@ -242,21 +232,21 @@ Pattern: p1
 
 
 
-== Attribute Retrieval ==
+= Attribute Retrieval =
 
 This step processes the result of the Pattern Matching step. The PatternId is used
 to look up the corresponding attribute map. The patternId and the attribute map
 are returned.
 
 
-==== Attribute Parsing ====
+=== Attribute Parsing ===
 
 An attribute map can contain attributes which are parsed out of the input string.
 
 TODO: define this more
 
 
-==== Notes ====
+=== Notes ===
 
 If no attribute map is found, an empty map is used.
 



Mime
View raw message