lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "AJ Weber" <awe...@comcast.net>
Subject Re: Postcode/zipcode search
Date Tue, 06 May 2008 16:54:34 GMT
Maybe I'm oversimplifying it, and maybe this isn't what you desire, but...

What about breaking the postcode into two (or three) different fields?  Seems easy to parse
on the ingestion-side, as you just break the string at the "middle" space.  Then store "postal_area",
"postal_street", and optionally the original, full "postalcode".  (Probably do not need to
tokenize the first two, maybe the last one.)

Then, and here's where you may throw this idea out entirely, it depends on how your searching
application/page is setup.  You'd need to apply the values entered by the user appropriately.
 If they enter 2-3chars with no spaces, search on the "postal_area" field.  If they enter
> 4 chars (including a space), you could, again, split the string at the space and search
on the two individual fields.

If you kept the original, full "postalcode" field, you could always put a link on the search
results (or maybe only if zero results are returned) saying, "Didn't find what you're looking
for?  Click here to broaden your search!"  -- And in that case send the whole query-string
against the postalcode field.

Dunno.  Just an idea.  Good Luck!

-AJ

  ----- Original Message ----- 
  From: Chris Mannion 
  To: java-user@lucene.apache.org 
  Sent: Tuesday, May 06, 2008 12:28 PM
  Subject: Postcode/zipcode search


  Hi all

  I've got a bit of a niggling problem with how one of my searches is working
  as opposed to how my users would like it too work.  We're indexing on UK
  postcodes, which are in the format of a 3 or 4 character area code followed
  by a 3 or 4 character street specific code, e.g. "NW10 7NY" or "M11 1LQ".
  We originally had the values being indexed as tokenized and used a very
  simple search string in the format "postcode:xxx xxx", with no grouping or
  boosting or fuzzy searching, just an straight search on whatever the user
  answered.  This had the benefit of finding exact matches to searches and
  allowing us to search just on the area part of the code to return all
  records with that area code, eg a search on "NW2" returning anything
  starting NW2, like "NW2 6TB", "NW2 1ER" etc etc.

  However, the downside to that was that searches could also return records
  only tenuously related to what was searched for, eg. a search for "NW10 7NY"
  would also return a record with a postcode "SE9 6NY" because of the slight
  match of the "NY".  Obviously this was technically correct but users
  complained because their searches were returning records from completely
  different areas.  Our first step to put this right was to take off the
  tokenization of the field, which we also weren't happy with so have
  continued to fiddle.

  The current status is as follows - we index the values by stripping out
  spaces and tokeniing them and use a keywordAnalyzer.  In searching we also
  strip spaces from the search term entered and search with a
  keywordAnalyzer.  Searches for full postcodes, e.g. "NW10 7NY" find all
  exact matches but also any full values that are partial matches (e.g. some
  records just have "NW10" as their postcode field and the "NW10 7NY" search
  pulls them back too), but searches for partial postcodes e.g. "NW10" still
  only finds exact matches, e.g. it only pulls back those record that have
  just "NW10" as their postcode, rather than anything *starting* with NW10 as
  we'd like it to do.

  Can anyone help me get this working in the way we need it too please?

  -- 
  Chris Mannion
  iCasework and LocalAlert implementation team
  0208 144 4416

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message