devicemap-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Devicemap Wiki] Update of "DataSpec2" by rezan
Date Wed, 14 Jan 2015 09:28:26 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Devicemap Wiki" for change notification.

The "DataSpec2" page has been changed by rezan:
https://wiki.apache.org/devicemap/DataSpec2?action=diff&rev1=29&rev2=30

  <<TableOfContents(2)>>
  
  = Data Specification 2.0 =
- Draft 1, 2014-01-12
+ Draft 1, 2014-01-14
  
  This is the Device``Map Data specification for patterns and attributes.
  
@@ -76, +76 @@

  Each pattern file defines these input parsing rules:
  
   InputTransformers::
-  :: Type: list of transformation steps
+  :: Type: list of transformers
   :: Optional. Default: none
-  :: TODO: define what exactly these can be. Basic? Freeform? Regex? ...?
  
   TokenSeparators::
   :: Type: list of token seperator strings
@@ -107, +106 @@

  === Example ===
  
  {{{
- InputTransformers: lowercase, s/[0-9]+/_NUM/g, s/-//g
+ InputTransformers: lowercase, ReplaceAll(Find: '-', Replace: '')
  TokenSeparators:   [space]
  NgramConcatSize:   2
  
- Input string:  A 12 x-yZ
+ Input string:  'A 12 x-yZ'
  
- Transform:     a _NUM xyz
+ Transform:     'a 12 xyz'
  
- Tokenization:  a, _NUM, xyz
+ Tokenization:  a, 12, xyz
  
- Ngram:         a_NUM, a, _NUMxyz, _NUM, xyz
+ Ngram:         a12, a, 12xyz, 12, xyz
  }}}
  
  
@@ -192, +191 @@

   :: Strong patterns are ranked higher than Weak and None. The Rank``Value is ignored and
they are ranked by their position in the pattern stream. Specifically, the last matched token
position. The lower the position, the higher the rank. When a Strong pattern is found, the
pattern matching step can stop and this pattern can be returned without analyzing the rest
of the stream. This is because its impossible for another pattern to rank higher.
  
   Weak::
-  :: Weak patterns are ranked below Strong but above None. A Weak pattern can only be returned
in the absence of a Strong candidate. Weak patterns always rank higher than None patterns,
regardless of their Rank``Value. The Rank``Value is used to rank between other Weak patterns.
+  :: Weak patterns are ranked below Strong but above None. A Weak candidate can only be returned
in the absence of a Strong candidate. Weak candidates always rank higher than None candidates,
regardless of their Rank``Value. The Rank``Value is used to rank between other Weak patterns.
  
   None::
-  :: None patterns are ranked below Strong and Weak. A None pattern can only be returned
in the absence of Strong and Weak candidates. The Rank``Value is used to rank between other
None patterns.
+  :: None patterns are ranked below Strong and Weak. A None candidate can only be returned
in the absence of Strong and Weak candidates. The Rank``Value is used to rank between other
None patterns.
  
  In the case where 2 or more candidates have the same Rank``Type and Rank``Value resulting
in a tie,
  the candidate with the longest concatenated matched pattern length is used. If that results
in
@@ -221, +220 @@

    PatternId: p1
    RankType: Strong
    PatternType: Simple
-   PatternTokens: bingo, jackpot
+   PatternTokens: bingo,jackpot
  
  Pattern:
    PatternId: p2
@@ -259, +258 @@

  === Attribute Parsing ===
  
  An attribute map can contain attributes values which are parsed out of the input string.
- 
- TODO: define this more
+ This is done by configuring the attribute as a set of transformers. The attribute can also
+ have a default value if the transformers return an error.
  
  
  === Notes ===
  
- If no attribute map is found, an empty map is used.
+ If no attribute map is found, an empty map can be used.
  
  If a null pattern is returned from the previous step, this must be safely returned.
  TODO: how?
+ 
+ 
+ 
+ = Transformers =
+ 
+ Transformers take in an input string, apply an action, and then return a string.
  
  
  

Mime
View raw message