lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Armasu, Narcis" <arm...@amazon.com>
Subject RE: How to index IP addresses?
Date Thu, 30 Jul 2009 13:41:01 GMT
Keyword: Field.Index.NOT_ANALYZED

-----Original Message-----
From: ohaya@cox.net [mailto:ohaya@cox.net] 
Sent: Thursday, July 30, 2009 4:36 PM
To: java-user@lucene.apache.org
Subject: How to index IP addresses?

Hi,

I am trying to index information in some proprietary-formatted files.  

In particular, these files contain some IP addresses in dotted notation, e.g., aa.bb.cc.dd.

For my initial test, I have a Document implementation, and after I extract what I need into
a String named "Info", I do:

doc.add(new Field("contents", Info, Field.Store.YES, Field.Index.ANALYZED));

From looking at the resulting index using Luke, it appears that I am getting terms for the
full IP address string (e.g., "aa.bb.cc.dd"), but I am also getting terms for each octet of
each IP address string, e.g.:

aa
bb
cc
dd

I'm still just getting started with Lucene, but from the research that I've done, it seems
like Lucene is treating the "." in the dotted notation strings as "noise".  Is that correct?

If so, is there a way to get it not to do that?

Thanks,
Jim

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org




Amazon Development Center (Romania) S.R.L. registered office: 37 Lazar Street, floor 5, Iasi,
Iasi County, Iasi 700049, Romania. Registered in Romania. Registration number J40/12967/2005.
Mime
View raw message