lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Altimatic <>
Subject Finding frequency of regex query match in a field
Date Fri, 15 Jan 2010 14:14:22 GMT

Hi All, 

I have an application that has to count the frequency that a specific
regular expression is matched on a particular field for each document in an
indexed directory.

For example. 

Lets say I have 2 documents in the directory and each document has 3 fields,
"table", "column" and "data".

Example Doc(s):
Document doc1 = new Document();
doc1.add(new Field("table", "EMPLOYEE_US", Field.Store.NO,
doc1.add(new Field("column", "F_NAME", Field.Store.NO,
doc1.add(new Field("data", "Chris Hank Tony Cody Tom Tina Crystal",
Field.Store.NO, Field.Index.ANALYZED,

Document doc2 = new Document();
doc2.add(new Field("table", "EMPLOYEE_CA", Field.Store.NO,
doc2.add(new Field("column", "F_NAME", Field.Store.NO,
doc2.add(new Field("data", "Bob Billy Tom Toby Charles Krista Madonna",
Field.Store.NO, Field.Index.ANALYZED,

//I know I can  create a query to search for a regular expression and that
will return each
//document that contains a match.

IndexWriter writer = new IndexWriter(directory, new WhitespaceAnalyzer(),
searcher = new IndexSearcher(directory);

RegexQuery query = new RegexQuery( newTerm("data", "^T.*)); 
ScoreDoc[] hits =, null,
maxNumOfHits).scoreDocs;//grab the score docs and go through them to find
the documents that contain a match


The code above will tell me that both doc1 and doc2 contain a match for the
constructed query. 

However I need to know how many times the regular expression was matched in
each document. ie.

doc1 = 3
doc2 = 2

I hope I am being clear...and thanks in advance.


View this message in context:
Sent from the Lucene - Java Users mailing list archive at

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message