lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Martin Ruckli <martin.ruc...@buzzamite.ch>
Subject LanguageDetection inside of ExtractingRequestHandler
Date Tue, 19 Jun 2012 15:10:38 GMT
Hi all,

I just wanted to check if there is a demand for this feature. I had to implement this functionality
for one of our customers and would like to contribute it.

Here is the use case:
We are using the ExtractingRequestHandler with the extractOnly=true flag set.
With a request to this handler we get the content of a posted document like we want to. We
would also like to detect the language and return it as a metadata field in the response from
solr.
As there is already support for LanguageDetection based on tika integrated into solr, the
only thing what I did was add a new param to enable or disable this feature and then do the
language detection nearly the same way as it is done in the TikaLanguageIdentifierUpdateProcessor
I think this would be a nice addition, especially in the extractOnly mode.

What are your thoughts on this?

Cheers
Martin


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message