lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hoss Man (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SOLR-3434) CSVRequestHandler does not trim header when using header=true&trim=true
Date Fri, 04 May 2012 02:22:48 GMT

     [ https://issues.apache.org/jira/browse/SOLR-3434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Hoss Man updated SOLR-3434:
---------------------------

    Description: 
when using {{header=true&trim=true}} the field names in the header row are not trimmed.

this is consistent with the documentation, but that doesn't mean it makes sense.

would be good to change this so trim=true also applied to the header row (at least by default)

  was:
The documentation says:
header
true if the first line of the CSV input contains field or column names. The default is header=true.
If the fieldnames parameter is absent, these field names will be used when adding documents
to the index.

My command:
/usr/bin/curl  --proxy ""   'http://localhost:8983/solr/update/csv?commit=true&debug=true&separator=|&escape=\&trim=true&header=true&overwrite=true'
 --data-binary @/tmp/file_with_header.txt   -H 'Content-type:text/plain; charset=utf-8'

My data file (/tmp/file_with_header.txt) :
|busdate |book_id    |jq_idn       |name_id
|--------|-----------|-------------|-----------
|20120420|      15600|   2070469502|      12787
|20120420|      64400|   2070469503|      12787
|20120420|     100000|   2070469501|      12787
|20120420|      60000|   2070469504|      12787
|20120420|      60000|   2070538002|      12787
|20120420|     206501|   2070538003|      12787
|20120420|     199418|   2070538004|      12787
|20120420|       7000|   2070538005|      12787

schema.xml: (tried different variations)
    897    <field name="jq_idn" type="string" indexed="true" stored="true" required="false"
/> 
   1005    <uniqueKey>jq_idn</uniqueKey>


Stack trace:
SEVERE: org.apache.solr.common.SolrException: Document is missing mandatory uniqueKey field:
jq_idn
        at org.apache.solr.update.UpdateHandler.getIndexedId(UpdateHandler.java:118)
        at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:229)
        at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:61)
        at org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:115)
        at org.apache.solr.handler.CSVLoader.doAdd(CSVRequestHandler.java:416)
        at org.apache.solr.handler.SingleThreadedCSVLoader.addDoc(CSVRequestHandler.java:431)
        at org.apache.solr.handler.CSVLoader.load(CSVRequestHandler.java:393)
        at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:58)
        at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
        at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:244)
        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1376)
        at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:365)
        at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:260)
        at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
        at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
        at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
        at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
        at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
        at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
        at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
        at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
        at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
        at org.mortbay.jetty.Server.handle(Server.java:326)
        at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
        at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945)
        at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756)
        at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
        at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
        at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
        at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)


    
> CSVRequestHandler does not trim header when using header=true&trim=true
> -----------------------------------------------------------------------
>
>                 Key: SOLR-3434
>                 URL: https://issues.apache.org/jira/browse/SOLR-3434
>             Project: Solr
>          Issue Type: Improvement
>    Affects Versions: 3.6
>         Environment: Linux
>            Reporter: david babits
>              Labels: CSV,, header, separator
>
> when using {{header=true&trim=true}} the field names in the header row are not trimmed.
> this is consistent with the documentation, but that doesn't mean it makes sense.
> would be good to change this so trim=true also applied to the header row (at least by
default)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message