lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <oh...@cox.net>
Subject How to index IP addresses?
Date Thu, 01 Jan 1970 00:00:00 GMT
Hi,

I am trying to index information in some proprietary-formatted files.  

In particular, these files contain some IP addresses in dotted notation, e.g., aa.bb.cc.dd.

For my initial test, I have a Document implementation, and after I extract what I need into
a String named "Info", I do:

doc.add(new Field("contents", Info, Field.Store.YES, Field.Index.ANALYZED));

>From looking at the resulting index using Luke, it appears that I am getting terms for
the full IP address string (e.g., "aa.bb.cc.dd"), but I am also getting terms for each octet
of each IP address string, e.g.:

aa
bb
cc
dd

I'm still just getting started with Lucene, but from the research that I've done, it seems
like Lucene is treating the "." in the dotted notation strings as "noise".  Is that correct?

If so, is there a way to get it not to do that?

Thanks,
Jim

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message