devicemap-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Devicemap Wiki] Trivial Update of "Patterns2" by rezan
Date Fri, 09 Jan 2015 21:39:19 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Devicemap Wiki" for change notification.

The "Patterns2" page has been changed by rezan:
https://wiki.apache.org/devicemap/Patterns2?action=diff&rev1=2&rev2=3

Comment:
type

  Draft 1, 2014-01-09
  
  This is the DeviceMap data specification for patterns and attributes.
- 
- All encodings in this document are UTF8.
  
  === Overview ===
  
@@ -58, +56 @@

  
  Each pattern file defines the domain input parsing rules:
  
-  inputTransformers::
+  InputTransformers::
   :: Type: list of transformation steps
   :: Optional. Default: none
   :: TODO: define what exactly these can be.
  
-  tokenSeparators::
+  TokenSeparators::
   :: Type: list of token seperator strings
   :: Optional. Default: none
  
-  ngramConcatSize::
+  NgramConcatSize::
   :: Type: greater than zero integer
   :: Optional. Default: 1
  
@@ -84, +82 @@

  pattern matching step before moving on to the next token. This algorithm is pipeline
  and thread safe.
  
- If the ngramConcatSize is greater than 1, the largest ngram must be
+ If the Ngram``Concat``Size is greater than 1, the largest ngram must be
  made first before creating the smaller ngrams.
  
  
  === Example ===
  
  {{{
- inputTransformers: lowercase, [0-9]+ => _NUM
+ InputTransformers: lowercase, [0-9]+ => _NUM
- tokenSeparators:   [space]
+ TokenSeparators:   [space]
- ngramConcatSize:   2
+ NgramConcatSize:   2
  
  Input string:  A 12 xyZ
  
@@ -122, +120 @@

  
  All the pattern types are prefixed with 'Simple'. This means that each pattern token is
matched
  using a plain UTF8 string comparison. No regex or other syntax is allowed in Simple patterns.
- This allows the algorithm to use simple string hashing for matching. This gives maximum
performance and scaling complexity equal to a hashtable implementation. A Simple``HashCount
attribute can be optionally defined which hints the classifier as to how many unique hashes
it would need to generate to support the pattern set.
+ This allows the algorithm to use simple string hashing for matching. This gives maximum
performance and scaling complexity equal to a hashtable implementation. A Simple``Hash``Count
attribute can be optionally defined which hints the classifier as to how many unique hashes
it would need to generate to support the pattern set.
  
  Pattern attributes:
  
@@ -149, +147 @@

   Default::
   :: Type: boolean
   :: Optional. Default: false.
-  :: Only 1 pattern can have a true value of false.
+  :: Only 1 pattern can have a true value.
  
  
  == PatternType ==

Mime
View raw message