Hello everybody,
I'm new to all this so I hope this isn't too noob a question and that it
isn't very inappropriate here.
I'm currently working on a indexing/searching application based on Apache
Lucene core, that can process mathematical formulae in MathML format
(which is extension to XML) and store it in the index for searching. No
troubles here, since I'm making everything above Lucene.
But I started to think it would be nice to write this mathematical
extension so it could be incorporated into Solr as easy as possible in the
future. The thing is I looked into Solr's sources and I'm all confused to
be honest and don't know which way to do this.
Basic workflow of the whole math processing would be:
Check the input document for any math->if found, mathematical unit needs
to process it and produce many string-represented formulae with different
boosts->put these into index not tokenized furthermore.
That's about it.
Any ideas? Any help will be appreciated.
Thank you
Martin