lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ken Krugler <kkrugler_li...@transpac.com>
Subject Shingles from WDFF
Date Fri, 24 Mar 2017 21:23:47 GMT
Hi all,

I’ve got some ancient Lucene tokenizer code from 2006 that I’m trying to avoid forward-porting,
but I don’t think there’s an equivalent in Solr 5/6.

Specifically it’s applying shingles to the output of something like the WordDelimiterFilter
- e.g. MySuperSink gets split into “My” “Super” “Sink”, and then shingled (if
we’re using shingle size of 2) to be “My”, “MySuper”, “Super”, “SuperSink”,
“Sink”.

I can’t just follow the WDF with a single filter because shingles aren’t created across
terms coming into the WDF - it’s only for the pieces generated by the WDF.

Or is there actually a way to make this work with Solr 5/6?

Thanks,

— Ken

--------------------------
Ken Krugler
+1 530-210-6378
http://www.scaleunlimited.com
custom big data solutions & training
Hadoop, Cascading, Cassandra & Solr




Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message