lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <>
Subject Re: Math Processing for Solr
Date Thu, 15 Apr 2010 13:58:22 GMT
Payloads are used to set boosts for tokens.  Have a look at the PayloadTermQuery.  There is
a patch for support in Solr, but it isn't committed yet.


On Apr 15, 2010, at 8:46 AM, wrote:

> Yes, I considered creating own analyzer with a set of filters. Trouble is,
> that I wouldn't be able to set different boosts for the tokens created by
> the filters(filters need to create additional token to the input one and
> set a lower boost for it), which is kind of crucial funcionality. Even the
> tokenizer at the beginning of the process needs to set different boosts to
> different tokens produced. As far as I know, it is possible to set boosts
> only to Fields though.
> This is now more of a discussion for the Lucene lists, I guess.
> Thanks for the replies anyway.
> Martin
>> (perhaps more appropriate on solr-user@)
>> It sounds like you want to make a MathML filter?  Check out the
>> analyzer packages...
>> simple example:
>> ryan
>> 2010/4/14  <>:
>>> Hello everybody,
>>> I'm new to all this so I hope this isn't too noob a question and that it
>>> isn't very inappropriate here.
>>> I'm currently working on a indexing/searching application based on
>>> Apache
>>> Lucene core, that can process mathematical formulae in MathML format
>>> (which is extension to XML) and store it in the index for searching. No
>>> troubles here, since I'm making everything above Lucene.
>>> But I started to think it would be nice to write this mathematical
>>> extension so it could be incorporated into Solr as easy as possible in
>>> the
>>> future. The thing is I looked into Solr's sources and I'm all confused
>>> to
>>> be honest and don't know which way to do this.
>>> Basic workflow of the whole math processing would be:
>>> Check the input document for any math->if found, mathematical unit needs
>>> to process it and produce many string-represented formulae with
>>> different
>>> boosts->put these into index not tokenized furthermore.
>>> That's about it.
>>> Any ideas? Any help will be appreciated.
>>> Thank you
>>> Martin

Grant Ingersoll

Search the Lucene ecosystem using Solr/Lucene:

View raw message