lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <>
Subject Re: New to Apache Lucene: Need help in querying data - text with wildCards
Date Mon, 10 Feb 2014 17:10:46 GMT
Likely your analyzer (which one are you using?) is breaking up your
text into tokens you don't expect?

If you use QueryParser, passing the same analyzer, then it will also
tokenize your query into the same tokens, and you should get the
expected hits.

But you may need your own analyzer to "properly" (by your definition)
tokenize the log messages...

Mike McCandless

On Mon, Feb 10, 2014 at 12:06 PM, gudiseashok <> wrote:
> I have an application which is a log-analyzer, and I am using Apache Lucene
> to index my data, and I am storing only message in it (I am not storing all
> other fields in my object), and I am not using any database so I am using
> store for message though its huge) but I am taking care of deleting this
> data weekly to start a fresh indexing.
> I have created a domain object to ease my search with lucene in retrieving
> and indexing  my data.
> I have these kind of fields in my object,
> className (value is fully qualified class with package, example:
> com.domain.infrastructure.MyClass), messageType (value example: xml, log
> message, exception)
> logLevel, timestamp (I am storing this as Long type)
> and logMessage (contains text and special characters like <,[,{.etc.)
> Main purpose is to retrieve logMessage based on user request, few scenarios
> below...
> Case 1:  User can request a soap message (messageType:XML), at
> particularTime (timestamp: longVariable),
> Case 2: User can request a particular message (messageType: logMessage), at
> particular time (timestamp:longVariable), from particular className
> (className:com.businessdomain.layer.MyClass)
> Or Case 3: User can request a particular message(messageType: Exception), in
> loglevel (logLevel: DEBUG) at particular time (timestamp:longVariable)
> Currently I am Indexing data like this:
> <code>
> document.add(new StringField("className", logsVO.getClassName(),
> Field.Store.NO));
>                 document.add(new StringField("logLevel", logsVO.getLogLevel(),
> Field.Store.NO));
> document.add(new TextField("logMessage", logsVO.getLogMessage(),
> Field.Store.YES));
> document.add(new StringField("messageType",
> logsVO.getMessageType().toString(), Field.Store.NO));
> document.add(new NumericDocValuesField("path", logsVO.hashCode()));
> document.add((new LongField("timeStamp", logsVO.getTimeStamp().getTime(),
> Field.Store.NO)));
> </code>
> Actual Log Line is like this:
> 2013-12-19 15:53:42.379 [server.startup : 0]  DEBUG
> o.a.commons.digester3.Digester -
> [ObjectCreateRule]{maplist/recvmap/recvfrag/recvfragoccurs/recvprop} Pop
> ''
> So here 2013-12-19 15:53:42.379 is timestamp,
> [server.startup : 0] - I will ignore this part
> DEBUG   is logLevel,
> 'o.a.commons.digester3.Digester' is className
> [ObjectCreateRule]{maplist/recvmap/recvfrag/recvfragoccurs/recvprop} Pop
> '' ---- This is my logMessage
> Now I am coming to my Problem: I have tried PhraseQuery,BooleanQuery and
> WildcardQuery too, but only time I am getting results is when I mentioned a
> small string like "pop" (in above logMessage), in all other cases which has
> any special characters I am not getting the results. Can anyone suggest what
> would be the pattern I have to use to satisfy above mentioned three cases
> user request?
> I appreciate your help in this regard.
> --
> View this message in context:
> Sent from the Lucene - Java Users mailing list archive at
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message