lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From vidya <vidya.nade...@tcs.com>
Subject indexing pdf files using post tool
Date Tue, 15 Mar 2016 07:17:48 GMT
Hi
I am trying to index a pdf file by using post tool in my linux system,When i
give the command
bin/post -c core2 -p 8984 /root/solr/My_CV.pdf
it is showing the search results like 
"response": {
    "numFound": 1,
    "start": 0,
    "docs": [
      {
        "id": "/root/solr-5.5.0/My_CV.pdf",
        "meta_creation_date": [
          "2016-03-15T06:22:17Z"
        ],
        "pdf_pdfversion": [
          1.4
        ],
        "dcterms_created": [
          "2016-03-15T06:22:17Z"
        ],
        "x_parsed_by": [
          "org.apache.tika.parser.DefaultParser",
          "org.apache.tika.parser.pdf.PDFParser"
        ],
        "xmptpg_npages": [
          1
        ],
        "creation_date": [
          "2016-03-15T06:22:17Z"
        ],
        "pdf_encrypted": [
          false
        ],
        "title": [
          "My CV"
        ],
        "stream_content_type": [
          "application/pdf"
        ],
        "created": [
          "Tue Mar 15 06:22:17 UTC 2016"
        ],
        "stream_size": [
          18289
        ],
        "dc_format": [
          "application/pdf; version=1.4"
        ],
        "producer": [
          "wkhtmltopdf"
        ],
        "content_type": [
          "application/pdf"
        ],
        "xmp_creatortool": [
          "þÿ"
        ],
        "resourcename": [
          "/root/solr/My_CV.pdf"
        ],
        "dc_title": [
          "My CV"
        ],
        "_version_": 1528851429701189600
      }


but not the actual content in pdf file.
How to index that dat.
Please help me on this.
Can post tool be used for indexing data from HDFS ?



--
View this message in context: http://lucene.472066.n3.nabble.com/indexing-pdf-files-using-post-tool-tp4263811.html
Sent from the Solr - User mailing list archive at Nabble.com.

Mime
View raw message