devicemap-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From re...@apache.org
Subject svn commit: r1650669 - /devicemap/branches/2.0/data/README_PATTERNS
Date Fri, 09 Jan 2015 21:18:25 GMT
Author: rezan
Date: Fri Jan  9 21:18:24 2015
New Revision: 1650669

URL: http://svn.apache.org/r1650669
Log:
update

Modified:
    devicemap/branches/2.0/data/README_PATTERNS

Modified: devicemap/branches/2.0/data/README_PATTERNS
URL: http://svn.apache.org/viewvc/devicemap/branches/2.0/data/README_PATTERNS?rev=1650669&r1=1650668&r2=1650669&view=diff
==============================================================================
--- devicemap/branches/2.0/data/README_PATTERNS (original)
+++ devicemap/branches/2.0/data/README_PATTERNS Fri Jan  9 21:18:24 2015
@@ -46,6 +46,8 @@ The pattern and attribute files are JSON
  * Description
  * Publish date
 
+The objects will also contain the attributes defined below in this specification.
+
 
 
 = Input Parsing =
@@ -54,16 +56,16 @@ This step parses the input string and cr
 
 Each pattern file defines the domain input parsing rules:
 
- input transformers::
+ inputTransformers::
  :: Type: list of transformation steps
  :: Optional. Default: none
  :: TODO: define what exactly these can be.
 
- token separators::
+ tokenSeparators::
  :: Type: list of token seperator strings
  :: Optional. Default: none
 
- ngram concatenation size::
+ ngramConcatSize::
  :: Type: greater than zero integer
  :: Optional. Default: 1
 
@@ -80,24 +82,24 @@ When a token is created and added to the
 pattern matching step before moving on to the next token. This algorithm is pipeline
 and thread safe.
 
-If the ngram concatenation size is greater than 1, the largest ngram must be
+If the ngramConcatSize is greater than 1, the largest ngram must be
 made first before creating the smaller ngrams.
 
 
 === Example ===
 
 {{{
-Transformer: lowercase, [0-9]+ => _NUM
-Token separators: [space]
-ngram concatenation size: 2
+inputTransformers: lowercase, [0-9]+ => _NUM
+tokenSeparators:   [space]
+ngramConcatSize:   2
 
-input string: A 12 xyZ
+Input string:  A 12 xyZ
 
-Post transform: a _NUM xyz
+Transform:     a _NUM xyz
 
-Post tokenization: a, _NUM, xyz
+Tokenization:  a, _NUM, xyz
 
-Post ngram (token stream): a_NUM, a, _NUMxyz, _NUM, xyz
+Ngram:         a_NUM, a, _NUMxyz, _NUM, xyz
 }}}
 
 
@@ -118,10 +120,7 @@ and the highest ranking pattern is retur
 
 All the pattern types are prefixed with 'Simple'. This means that each pattern token is matched
 using a plain UTF8 string comparison. No regex or other syntax is allowed in Simple patterns.
-This allows the algorithm us use string hashing for matching. This gives maximum performance
-and scaling complexity equal to a hashtable implementation. A SimpleHashCount attribute can
-be defined which hints the classifier as to how many unique hashes it would need to generate
to
-support the pattern set.
+This allows the algorithm to use simple string hashing for matching. This gives maximum performance
and scaling complexity equal to a hashtable implementation. A Simple``HashCount attribute
can be optionally defined which hints the classifier as to how many unique hashes it would
need to generate to support the pattern set.
 
 Pattern attributes:
 
@@ -136,7 +135,6 @@ Pattern attributes:
  RankValue::
  :: Type: integer
  :: Optional. Default: 0.
- :: Use defined by RankType.
 
  PatternType::
  :: Type: string
@@ -157,7 +155,7 @@ Pattern attributes:
 The following pattern types are defined:
 
  SimpleOrderedAnd::
- :: Each pattern token must appear in the token stream in index order, as defined in the
PatternTokens list. Its okay for non matched tokens to appear inbetween matched tokens as
long as the matched tokens are still in order.
+ :: Each pattern token must appear in the token stream in index order, as defined in the
Pattern``Tokens list. Its okay for non matched tokens to appear inbetween matched tokens as
long as the matched tokens are still in order.
 
  SimpleAnd::
  :: Each pattern token must appear in the token stream. Order does not matter.
@@ -171,15 +169,15 @@ The following pattern types are defined:
 The following rank types are defined:
 
  Strong::
- :: Strong patterns are ranked higher than Weak and None. The RankValue is ignored and they
are ranked by their position in the pattern stream. The lower the position, the higher the
rank. When a Strong pattern is found, the pattern matching step can stop and this pattern
can be returned without analyzing the rest of the stream. This is because its impossible for
another pattern to rank higher.
+ :: Strong patterns are ranked higher than Weak and None. The Rank``Value is ignored and
they are ranked by their position in the pattern stream. The lower the position, the higher
the rank. When a Strong pattern is found, the pattern matching step can stop and this pattern
can be returned without analyzing the rest of the stream. This is because its impossible for
another pattern to rank higher.
 
  Weak::
- :: Weak patterns are ranked below Strong but above None. A Weak pattern can only be returned
in the absence of a Strong pattern. Weak patterns always rank higher than None patterns, regardless
of the RankValue. The RankValue is used to rank between successfully matched Weak patterns.
+ :: Weak patterns are ranked below Strong but above None. A Weak pattern can only be returned
in the absence of a Strong pattern. Weak patterns always rank higher than None patterns, regardless
of their Rank``Value. The Rank``Value is used to rank between successfully matched Weak patterns.
 
  None::
- :: None patterns are ranked below Strong and Weak. A None pattern can only be returned in
the absence of successful Strong and Weak patterns. The RankValue is used to rank between
successfully matched None patterns.
+ :: None patterns are ranked below Strong and Weak. A None pattern can only be returned in
the absence of successful Strong and Weak patterns. The Rank``Value is used to rank between
successfully matched None patterns.
 
-In the case where 2 or more Weak or None patterns have the same RankValue resulting in a
tie,
+In the case where 2 or more Weak or None patterns have the same Rank``Value resulting in
a tie,
 the pattern with the longest concatenated matched pattern length is used. If that results
in
 another tie, the pattern found first is returned.
 
@@ -188,12 +186,12 @@ default pattern is defined, a null patte
 
 === Notes ===
 
-If 2 or more patterns share the same PatternId, then only 1 of their PatternTypes
-need to match. There is an implied OR between multiple PatternTypes with equal PatternId.
+If 2 or more patterns share the same Pattern``Id, then only 1 of their Pattern``Types
+need to match. There is an implied OR between multiple Pattern``Types with equal Pattern``Id.
 
 If more than 1 default is defined, the 1st one found in the Pattern file is used.
 
-2 or more patterns cannot have identical RankType, RankValue, and matched tokens. Since they
will be
+2 or more patterns cannot have identical Rank``Type, Rank``Value, and matched tokens. Since
they will be
 found at the same time, the pattern the classifier chooses is undefined.
 
 
@@ -234,8 +232,8 @@ Pattern: p1
 
 = Attribute Retrieval =
 
-This step processes the result of the Pattern Matching step. The PatternId is used
-to look up the corresponding attribute map. The patternId and the attribute map
+This step processes the result of the Pattern Matching step. The Pattern``Id is used
+to look up the corresponding attribute map. The Pattern``Id and the attribute map
 are returned.
 
 
@@ -252,6 +250,6 @@ If no attribute map is found, an empty m
 
 The attribute map must be immutable.
 
-If a null pattern is returned from the previous step, this must be safely signaled back.
+If a null pattern is returned from the previous step, this must be safely returned.
 TODO: how?
 



Mime
View raw message