devicemap-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Devicemap Wiki] Trivial Update of "Patterns2" by rezan
Date Mon, 12 Jan 2015 03:48:26 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Devicemap Wiki" for change notification.

The "Patterns2" page has been changed by rezan:
https://wiki.apache.org/devicemap/Patterns2?action=diff&rev1=21&rev2=22

  
  Empty tokens are removed from the tokenization step.
  
- When a token is created and added to the token stream, it can be processed by the
+ When a token is added to the token stream, it can be processed by the
  pattern matching step before moving on to the next token. This algorithm is pipeline
  and thread safe.
  
+ If the Ngram``Concat``Size is greater than 1, ngrams must be added to the token stream ordered
largest to smallest.
- If the Ngram``Concat``Size is greater than 1, the largest ngram must be
- made first before creating the smaller ngrams.
  
  
  === Example ===
@@ -124, +123 @@

  
  = Pattern Matching =
  
- This step processes the token stream and returns the highest ranking pattern which
+ This step processes the token stream and returns the highest ranking candidate pattern.
- matches on the stream (highest ranking candidate).
  
  The pattern file defines a pattern set. Each pattern has 2 main attributes,
  its pattern type and its pattern rank. The pattern
  type defines how the pattern is supposed to be matched against the token stream.
  The pattern rank defines how the pattern ranks against other patterns.
  
+ All patterns in the pattern set are evaluated to find the pattern candidates.
- If the pattern type is successfully matched against the stream, it is now a candidate
- for being returned. Candidates are ranked against each other using the pattern ranking
- and the highest ranking pattern is returned.
  
  All the pattern types in 2.0 are prefixed with 'Simple'. This means that each pattern token
is matched
  using a plain byte string comparison. No regex or other syntax is allowed in Simple patterns.

Mime
View raw message