lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jeroen Steggink (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-11838) explore supporting Deeplearning4j NeuralNetwork models
Date Mon, 29 Jan 2018 17:21:00 GMT

    [ https://issues.apache.org/jira/browse/SOLR-11838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16343656#comment-16343656
] 

Jeroen Steggink commented on SOLR-11838:
----------------------------------------

As a start, I think applying models for LTR or classifying documents/fields when indexing
would be most useful.

One thing we shouldn't underestimate is data structures for Neural Networks. Depending on
the network structure a model may depend on a specific data structure. For example, timeseries-vectors
are very different from other vectors. Are we doing just bag-of-words or do we keep the order
of words? How many fields would your like as input? How many inputs can models have (preferably
ComputationGraphs, as they are more flexible).

Furthermore, we should think about what is actually going to work. Having one-hot encoding
for all terms in an index could be problematic. There is already a logistic regression implementation
which works great for simple classification. If we're going to use DL4J it should add something
more than Solr already offers.

Maybe we can think of a few specific use cases to make a prototype for?

 

I think we can make a DataVec record reader for Solr (@[~kwatters]). But I guess this is
something we can add to DataVec itself, instead of adding this to Solr. An alternative could
be to use Solr's Streaming API to return data in a format which is efficient and could be
directly used by DataVec.

Another thing I'd like to mention is dependencies. Instead of relying on DL4J specifically,
we could think about abstracting data input and output for machine learning and applying models
in general. As a DL4J user I'm not very interested in running it on a Solr server. I have
dedicated servers running DL4J models which I serve using REST APIs. The reason is that I
have servers with GPUs and lot's of RAM dedicated for this type of process. Solr on the other
hand can be very demanding in a different way.

 

> explore supporting Deeplearning4j NeuralNetwork models
> ------------------------------------------------------
>
>                 Key: SOLR-11838
>                 URL: https://issues.apache.org/jira/browse/SOLR-11838
>             Project: Solr
>          Issue Type: New Feature
>            Reporter: Christine Poerschke
>            Priority: Major
>         Attachments: SOLR-11838.patch
>
>
> [~yuyano] wrote in SOLR-11597:
> bq. ... If we think to apply this to more complex neural networks in the future, we will
need to support layers ...
> [~malcorn_redhat] wrote in SOLR-11597:
> bq. ... In my opinion, if this is a route Solr eventually wants to go, I think a better
strategy would be to just add a dependency on [Deeplearning4j|https://deeplearning4j.org/]
...
> Creating this ticket for the idea to be explored further (if anyone is interested in
exploring it), complimentary to and independent of the SOLR-11597 RankNet related effort.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message