lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dan Davis <dansm...@gmail.com>
Subject Re: Need Help with custom ZIPURLDataSource class
Date Mon, 26 Jan 2015 15:47:42 GMT
I have seen such errors by looking under Logging in the Solr Admin UI.
There is also the LogTransformer for Data Import Handler.

However, it is a design choice in Data Import Handler to skip fields not in
the schema.   I would suggest you always use Debug and Verbose to do the
first couple of documents through the GUI, and then look at the debugging
output with a fine toothed comb.

I'm not sure whether there's an option for it, but it would be nice if the
Data Import Handler could collect skipped fields into the status response.
  That would highlight your problem without forcing you to look in other
areas.


On Fri, Jan 23, 2015 at 9:51 PM, Carl Roberts <carl.roberts.zapata@gmail.com
> wrote:

> NVM - I have this working.
>
> The problem was this:  pk="link" in rss-dat.config.xml but unique id not
> link in schema.xml - it is id.
>
> From rss-data-config.xml:
>
> <entity name="cve-2002"
> *pk="link"*
>                 url="https://nvd.nist.gov/feeds/xml/cve/nvdcve-2.0-2002.
> xml.zip"
>                 processor="XPathEntityProcessor"
>                 forEach="/nvd/entry">
>             <field column="id" xpath="/nvd/entry/@id" commonField="true" />
>             <field column="cve" xpath="/nvd/entry/cve-id"
> commonField="true" />
>             <field column="cwe" xpath="/nvd/entry/cwe/@id"
> commonField="true" />
>             <!--
>             <field column="vulnerable-configuration"
> xpath="/nvd/entry/vulnerable-configuration/logical-test/fact-ref/@name"
> commonField="false" />
>             <field column="vulnerable-software"
> xpath="/nvd/entry/vulnerable-software-list/product" commonField="false" />
>             <field column="published" xpath="/nvd/entry/published-datetime"
> commonField="false" />
>             <field column="modified" xpath="/nvd/entry/last-modified-datetime"
> commonField="false" />
>             <field column="summary" xpath="/nvd/entry/summary"
> commonField="false" />
>             -->
>         </entity>
>
> From schema.xml:
>
> * <uniqueKey>id</uniqueKey>
>
> *What really bothers me is that there were no errors output by Solr to
> indicate this type of misconfiguration error and all the messages that Solr
> gave indicated the import was successful.  This lack of appropriate error
> reporting is a pain, especially for someone learning Solr.
>
> Switching pk="link" to pk="id" solved the problem and I was then able to
> import the data.
>
> On 1/23/15, 6:34 PM, Carl Roberts wrote:
>
>>
>> Hi,
>>
>> I created a custom ZIPURLDataSource class to unzip the content from an
>> http URL for an XML ZIP file and it seems to be working (at least I have
>> no errors), but no data is imported.
>>
>> Here is my configuration in rss-data-config.xml:
>>
>> <dataConfig>
>> <dataSource type="ZIPURLDataSource" connectionTimeout="15000"
>> readTimeout="30000"/>
>> <document>
>> <entity name="cve-2002"
>> pk="link"
>> url="https://nvd.nist.gov/feeds/xml/cve/nvdcve-2.0-2002.xml.zip"
>> processor="XPathEntityProcessor"
>> forEach="/nvd/entry"
>> transformer="DateFormatTransformer">
>> <field column="id" xpath="/nvd/entry/@id" commonField="true" />
>> <field column="cve" xpath="/nvd/entry/cve-id" commonField="true" />
>> <field column="cwe" xpath="/nvd/entry/cwe/@id" commonField="true" />
>> <field column="vulnerable-configuration"
>> xpath="/nvd/entry/vulnerable-configuration/logical-test/fact-ref/@name"
>> commonField="false" />
>> <field column="vulnerable-software"
>> xpath="/nvd/entry/vulnerable-software-list/product" commonField="false"
>> />
>> <field column="published" xpath="/nvd/entry/published-datetime"
>> commonField="false" />
>> <field column="modified" xpath="/nvd/entry/last-modified-datetime"
>> commonField="false" />
>> <field column="summary" xpath="/nvd/entry/summary" commonField="false" />
>> </entity>
>> </document>
>> </dataConfig>
>>
>>
>> Attached is the ZIPURLDataSource.java file.
>>
>> It actually unzips and saves the raw XML to disk, which I have verified
>> to be a valid XML file.  The file has one or more entries (here is an
>> example):
>>
>> <nvd xmlns:scap-core="http://scap.nist.gov/schema/scap-core/0.1"
>> xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
>> xmlns:patch="http://scap.nist.gov/schema/patch/0.1"
>> xmlns:vuln="http://scap.nist.gov/schema/vulnerability/0.4"
>> xmlns:cvss="http://scap.nist.gov/schema/cvss-v2/0.2"
>> xmlns:cpe-lang="http://cpe.mitre.org/language/2.0"
>> xmlns="http://scap.nist.gov/schema/feed/vulnerability/2.0"
>> pub_date="2015-01-10T05:37:05"
>> xsi:schemaLocation="http://scap.nist.gov/schema/patch/0.1
>> http://nvd.nist.gov/schema/patch_0.1.xsd
>> http://scap.nist.gov/schema/scap-core/0.1
>> http://nvd.nist.gov/schema/scap-core_0.1.xsd
>> http://scap.nist.gov/schema/feed/vulnerability/2.0
>> http://nvd.nist.gov/schema/nvd-cve-feed_2.0.xsd" nvd_xml_version="2.0">
>> <entry id="CVE-1999-0001">
>> <vuln:vulnerable-configuration id="http://nvd.nist.gov/">
>> <cpe-lang:logical-test operator="OR" negate="false">
>> <cpe-lang:fact-ref name="cpe:/o:bsdi:bsd_os:3.1"/>
>> <cpe-lang:fact-ref name="cpe:/o:freebsd:freebsd:1.0"/>
>> <cpe-lang:fact-ref name="cpe:/o:freebsd:freebsd:1.1"/>
>> <cpe-lang:fact-ref name="cpe:/o:freebsd:freebsd:1.1.5.1"/>
>> <cpe-lang:fact-ref name="cpe:/o:freebsd:freebsd:1.2"/>
>> <cpe-lang:fact-ref name="cpe:/o:freebsd:freebsd:2.0"/>
>> <cpe-lang:fact-ref name="cpe:/o:freebsd:freebsd:2.0.5"/>
>> <cpe-lang:fact-ref name="cpe:/o:freebsd:freebsd:2.1.5"/>
>> <cpe-lang:fact-ref name="cpe:/o:freebsd:freebsd:2.1.6"/>
>> <cpe-lang:fact-ref name="cpe:/o:freebsd:freebsd:2.1.6.1"/>
>> <cpe-lang:fact-ref name="cpe:/o:freebsd:freebsd:2.1.7"/>
>> <cpe-lang:fact-ref name="cpe:/o:freebsd:freebsd:2.1.7.1"/>
>> <cpe-lang:fact-ref name="cpe:/o:freebsd:freebsd:2.2"/>
>> <cpe-lang:fact-ref name="cpe:/o:freebsd:freebsd:2.2.3"/>
>> <cpe-lang:fact-ref name="cpe:/o:freebsd:freebsd:2.2.4"/>
>> <cpe-lang:fact-ref name="cpe:/o:freebsd:freebsd:2.2.5"/>
>> <cpe-lang:fact-ref name="cpe:/o:freebsd:freebsd:2.2.6"/>
>> <cpe-lang:fact-ref name="cpe:/o:freebsd:freebsd:2.2.8"/>
>> <cpe-lang:fact-ref name="cpe:/o:freebsd:freebsd:3.0"/>
>> <cpe-lang:fact-ref name="cpe:/o:openbsd:openbsd:2.3"/>
>> <cpe-lang:fact-ref name="cpe:/o:openbsd:openbsd:2.4"/>
>> <cpe-lang:fact-ref name="cpe:/o:freebsd:freebsd:2.2.2"/>
>> <cpe-lang:fact-ref name="cpe:/o:freebsd:freebsd:2.0.1"/>
>> </cpe-lang:logical-test>
>> </vuln:vulnerable-configuration>
>> <vuln:vulnerable-software-list>
>> <vuln:product>cpe:/o:freebsd:freebsd:2.2.8</vuln:product>
>> <vuln:product>cpe:/o:freebsd:freebsd:1.1.5.1</vuln:product>
>> <vuln:product>cpe:/o:freebsd:freebsd:2.2.3</vuln:product>
>> <vuln:product>cpe:/o:freebsd:freebsd:2.2.2</vuln:product>
>> <vuln:product>cpe:/o:freebsd:freebsd:2.2.5</vuln:product>
>> <vuln:product>cpe:/o:freebsd:freebsd:2.2.4</vuln:product>
>> <vuln:product>cpe:/o:freebsd:freebsd:2.0.5</vuln:product>
>> <vuln:product>cpe:/o:freebsd:freebsd:2.2.6</vuln:product>
>> <vuln:product>cpe:/o:freebsd:freebsd:2.1.6.1</vuln:product>
>> <vuln:product>cpe:/o:freebsd:freebsd:2.0.1</vuln:product>
>> <vuln:product>cpe:/o:freebsd:freebsd:2.2</vuln:product>
>> <vuln:product>cpe:/o:freebsd:freebsd:2.0</vuln:product>
>> <vuln:product>cpe:/o:openbsd:openbsd:2.3</vuln:product>
>> <vuln:product>cpe:/o:freebsd:freebsd:3.0</vuln:product>
>> <vuln:product>cpe:/o:freebsd:freebsd:1.1</vuln:product>
>> <vuln:product>cpe:/o:freebsd:freebsd:2.1.6</vuln:product>
>> <vuln:product>cpe:/o:openbsd:openbsd:2.4</vuln:product>
>> <vuln:product>cpe:/o:bsdi:bsd_os:3.1</vuln:product>
>> <vuln:product>cpe:/o:freebsd:freebsd:1.0</vuln:product>
>> <vuln:product>cpe:/o:freebsd:freebsd:2.1.7</vuln:product>
>> <vuln:product>cpe:/o:freebsd:freebsd:1.2</vuln:product>
>> <vuln:product>cpe:/o:freebsd:freebsd:2.1.5</vuln:product>
>> <vuln:product>cpe:/o:freebsd:freebsd:2.1.7.1</vuln:product>
>> </vuln:vulnerable-software-list>
>> <vuln:cve-id>CVE-1999-0001</vuln:cve-id>
>> <vuln:published-datetime>1999-12-30T00:00:00.000-05:00</vuln:published-datetime>
>>
>> <vuln:last-modified-datetime>2010-12-16T00:00:00.000-05:00</vuln:last-modified-datetime>
>>
>> <vuln:cvss>
>> <cvss:base_metrics>
>> <cvss:score>5.0</cvss:score>
>> <cvss:access-vector>NETWORK</cvss:access-vector>
>> <cvss:access-complexity>LOW</cvss:access-complexity>
>> <cvss:authentication>NONE</cvss:authentication>
>> <cvss:confidentiality-impact>NONE</cvss:confidentiality-impact>
>> <cvss:integrity-impact>NONE</cvss:integrity-impact>
>> <cvss:availability-impact>PARTIAL</cvss:availability-impact>
>> <cvss:source>http://nvd.nist.gov</cvss:source>
>> <cvss:generated-on-datetime>2004-01-01T00:00:00.000-05:00</cvss:generated-on-datetime>
>>
>> </cvss:base_metrics>
>> </vuln:cvss>
>> <vuln:cwe id="CWE-20"/>
>> <vuln:references reference_type="UNKNOWN" xml:lang="en">
>> <vuln:source>OSVDB</vuln:source>
>> <vuln:reference href="http://www.osvdb.org/5707"
>> xml:lang="en">5707</vuln:reference>
>> </vuln:references>
>> <vuln:references reference_type="UNKNOWN" xml:lang="en">
>> <vuln:source>CONFIRM</vuln:source>
>> <vuln:reference href="http://www.openbsd.org/errata23.html#tcpfix"
>> xml:lang="en">http://www.openbsd.org/errata23.html#tcpfix</vuln:reference>
>>
>> </vuln:references>
>> <vuln:summary>ip_input.c in BSD-derived TCP/IP implementations allows
>> remote attackers to cause a denial of service (crash or hang) via
>> crafted packets.</vuln:summary>
>> </entry>
>>
>>
>> Here is the curl command:
>>
>> curl http://127.0.0.1:8983/solr/nvd-rss/dataimport?command=full-import
>>
>> And here is the output from the console for Jetty:
>>
>> main{StandardDirectoryReader(segments_1:1:nrt)}
>> 2407 [coreLoadExecutor-5-thread-1] INFO
>> org.apache.solr.core.CoreContainer – registering core: nvd-rss
>> 2409 [main] INFO org.apache.solr.servlet.SolrDispatchFilter –
>> user.dir=/Users/carlroberts/dev/solr-4.10.3/example
>> 2409 [main] INFO org.apache.solr.servlet.SolrDispatchFilter –
>> SolrDispatchFilter.init() done
>> 2431 [main] INFO org.eclipse.jetty.server.AbstractConnector – Started
>> SocketConnector@0.0.0.0:8983
>> 2450 [searcherExecutor-6-thread-1] INFO org.apache.solr.core.SolrCore –
>> [nvd-rss] webapp=null path=null
>> params={event=firstSearcher&q=static+firstSearcher+warming+
>> in+solrconfig.xml&distrib=false}
>> hits=0 status=0 QTime=43
>> 2451 [searcherExecutor-6-thread-1] INFO org.apache.solr.core.SolrCore –
>> QuerySenderListener done.
>> 2451 [searcherExecutor-6-thread-1] INFO
>> org.apache.solr.handler.component.SpellCheckComponent – Loading spell
>> index for spellchecker: default
>> 2451 [searcherExecutor-6-thread-1] INFO
>> org.apache.solr.handler.component.SpellCheckComponent – Loading spell
>> index for spellchecker: wordbreak
>> 2452 [searcherExecutor-6-thread-1] INFO
>> org.apache.solr.handler.component.SuggestComponent – Loading suggester
>> index for: mySuggester
>> 2452 [searcherExecutor-6-thread-1] INFO
>> org.apache.solr.spelling.suggest.SolrSuggester – reload()
>> 2452 [searcherExecutor-6-thread-1] INFO
>> org.apache.solr.spelling.suggest.SolrSuggester – build()
>> 2459 [searcherExecutor-6-thread-1] INFO org.apache.solr.core.SolrCore –
>> [nvd-rss] Registered new searcher Searcher@df9e84e[nvd-rss]
>> main{StandardDirectoryReader(segments_1:1:nrt)}
>> 8371 [qtp1640586218-17] INFO
>> org.apache.solr.handler.dataimport.DataImporter – Loading DIH
>> Configuration: rss-data-config.xml
>> 8379 [qtp1640586218-17] INFO
>> org.apache.solr.handler.dataimport.DataImporter – Data Configuration
>> loaded successfully
>> 8383 [Thread-15] INFO org.apache.solr.handler.dataimport.DataImporter –
>> Starting Full Import
>> 8384 [qtp1640586218-17] INFO org.apache.solr.core.SolrCore – [nvd-rss]
>> webapp=/solr path=/dataimport params={command=full-import} status=0
>> QTime=15
>> 8396 [Thread-15] INFO
>> org.apache.solr.handler.dataimport.SimplePropertiesWriter – Read
>> dataimport.properties
>> 23431 [commitScheduler-8-thread-1] INFO
>> org.apache.solr.update.UpdateHandler – start
>> commit{,optimize=false,openSearcher=false,waitSearcher=true,
>> expungeDeletes=false,softCommit=false,prepareCommit=false}
>> 23431 [commitScheduler-8-thread-1] INFO
>> org.apache.solr.update.UpdateHandler – No uncommitted changes. Skipping
>> IW.commit.
>> 23432 [commitScheduler-8-thread-1] INFO
>> org.apache.solr.update.UpdateHandler – end_commit_flush
>> 47189 [Thread-15] INFO
>> org.apache.solr.handler.dataimport.ZIPURLDataSource – raw
>> bytes={19485161}
>> 47301 [Thread-15] INFO
>> org.apache.solr.handler.dataimport.ZIPURLDataSource – bytes available
>> are {19485161}
>> 47840 [Thread-15] INFO org.apache.solr.handler.dataimport.DocBuilder –
>> Import completed successfully
>> 47840 [Thread-15] INFO org.apache.solr.update.UpdateHandler – start
>> commit{,optimize=false,openSearcher=true,waitSearcher=true,
>> expungeDeletes=false,softCommit=false,prepareCommit=false}
>> 47840 [Thread-15] INFO org.apache.solr.update.UpdateHandler – No
>> uncommitted changes. Skipping IW.commit.
>> 47841 [Thread-15] INFO org.apache.solr.core.SolrCore – SolrIndexSearcher
>> has not changed - not re-opening: org.apache.solr.search.
>> SolrIndexSearcher
>> 47841 [Thread-15] INFO org.apache.solr.update.UpdateHandler –
>> end_commit_flush
>>
>> Can someone please help me figure out why the data is not being
>> imported? Perhaps I missed something?
>>
>> Regards,
>>
>> Joe
>>
>>
>>
>>
>>
>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message