lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Greg Walters <greg.walt...@answers.com>
Subject Re: SolrCell and indexing HTML
Date Fri, 21 Mar 2014 17:08:43 GMT
I've never tried indexing via groovy or using solrCell but I think you might be working a bit
too low level in solrj if you're just adding documents. You might try checking out https://wiki.apache.org/solr/Solrj#Adding_Data_to_Solr
and I might be way off base :)

Thanks,
Greg

On Mar 21, 2014, at 11:56 AM, Liz Sommers <lizsworks@gmail.com> wrote:

> I am trying to write a POC about indexing URL's with Solr using solrJ and
> solrCell.  (The code is written in groovy).
> 
> The relevant code is here
> 
> ContentStreamUpdateRequest req = new
> ContentStreamUpdateRequest("/update/extract");
> 
>        req.setParam("literal.id",p.id.toString())
>        req.setParam("extractOnly","true")
>        URL url = new URL(p.url)
>        ContentStream stream = new ContentStreamBase.URLStream(url)
>        req.addContentStream(stream)
> 
>        def result = server.request(req)
>        println "result: ${result}"
> 
> 
> When I set extractOnly to true I get everything in the URL.  All the tags,
> all the stylesheets.  When I set it to false I get a response that has
> nothing in it except
> 
> result: {responseHeader={status=0,QTime=19}}
> 
> When I test it with the admin tools, nothing in the url has been indexed as
> far as I can tell.
> I know I am doing something wrong with the params, but I haven't figured
> out what.  Can somebody please help me.
> 
> Thanks
> Liz Sommers
> lizzysom@gmail.com
> lizsworks@gmail.com


Mime
View raw message