lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From José Tomás Atria <jtat...@gmail.com>
Subject Using POS payloads for chunking
Date Wed, 14 Jun 2017 20:29:36 GMT
Hello!

I'm not particularly familiar with lucene's search api (as I've been using
the library mostly as a dumb index rather than a search engine), but I am
almost certain that, using its payload capabilities, it would be trivial to
implement a regular chunker to look for patterns in sequences of payloads.

(trying not to be too pedantic, a regular chunker looks for 'chunks' based
on part-of-speech tags, e.g. noun phrases can be searched for with patterns
like "(DT)?(JJ)*(NN|NP)+", that is, an optional determinant and zero or
more adjectives preceding a bunch of nouns, etc)

Assuming my index has POS tags encoded as payloads for each position, how
would one search for such patterns, irrespective of terms? I started
studying the spans search API, as this seemed like the natural place to
start, but I quickly got lost.

Any tips would be extremely appreciated. (or references to this kind of
thing, I'm sure someone must have tried something similar before...)

thanks!
~jta
-- 

sent from a phone. please excuse terseness and tpyos.

enviado desde un teléfono. por favor disculpe la parquedad y los erroers.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message