lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hoss Man (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-3434) CSVRequestHandler does not parse header properly
Date Fri, 04 May 2012 02:10:48 GMT

    [ https://issues.apache.org/jira/browse/SOLR-3434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13268040#comment-13268040
] 

Hoss Man commented on SOLR-3434:
--------------------------------

bq. but I did have to remove the spaces after the header names

That right there seems to be the crux of hte issue.

The {{header=true}} parsing is working fine, but the devil is in the detail of the docs for
the "trim" option...

http://wiki.apache.org/solr/UpdateCSV#trim

bq. If true remove leading and trailing whitespace from values. ...

...it was only ever designed to trim the _values_, not the names of the fields in the header.

using the 3.6 example, you can see this clearly with data like...

{noformat}
|foo_s   |book_d_i   |id           |name_id_i
|--------|-----------|-------------|-----------
|20120420|      15600|   2070469502|      12787
|20120420|      64400|   2070469503|      12787
{noformat}

Which, when using header=true, generates a very clear error...

{noformat}
SEVERE: org.apache.solr.common.SolrException: undefined field: "foo_s   "
	at org.apache.solr.schema.IndexSchema.getField(IndexSchema.java:1261)
	at org.apache.solr.handler.CSVLoader.prepareFields(CSVRequestHandler.java:290)
{noformat}

I suspect the reason the david didn't get this kind of an error with his fields is because
of a {{"\*"}} dynamicField.

I'm not sure that there is really a bug here since it's working as documented, but i think
it would certainly make sense to enhance the handler to also trim the header if trim=true.
                
> CSVRequestHandler does not parse header properly
> ------------------------------------------------
>
>                 Key: SOLR-3434
>                 URL: https://issues.apache.org/jira/browse/SOLR-3434
>             Project: Solr
>          Issue Type: Bug
>    Affects Versions: 3.6
>         Environment: Linux
>            Reporter: david babits
>              Labels: CSV,, header, separator
>
> The documentation says:
> header
> true if the first line of the CSV input contains field or column names. The default is
header=true. If the fieldnames parameter is absent, these field names will be used when adding
documents to the index.
> My command:
> /usr/bin/curl  --proxy ""   'http://localhost:8983/solr/update/csv?commit=true&debug=true&separator=|&escape=\&trim=true&header=true&overwrite=true'
 --data-binary @/tmp/file_with_header.txt   -H 'Content-type:text/plain; charset=utf-8'
> My data file (/tmp/file_with_header.txt) :
> |busdate |book_id    |jq_idn       |name_id
> |--------|-----------|-------------|-----------
> |20120420|      15600|   2070469502|      12787
> |20120420|      64400|   2070469503|      12787
> |20120420|     100000|   2070469501|      12787
> |20120420|      60000|   2070469504|      12787
> |20120420|      60000|   2070538002|      12787
> |20120420|     206501|   2070538003|      12787
> |20120420|     199418|   2070538004|      12787
> |20120420|       7000|   2070538005|      12787
> schema.xml: (tried different variations)
>     897    <field name="jq_idn" type="string" indexed="true" stored="true" required="false"
/> 
>    1005    <uniqueKey>jq_idn</uniqueKey>
> Stack trace:
> SEVERE: org.apache.solr.common.SolrException: Document is missing mandatory uniqueKey
field: jq_idn
>         at org.apache.solr.update.UpdateHandler.getIndexedId(UpdateHandler.java:118)
>         at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:229)
>         at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:61)
>         at org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:115)
>         at org.apache.solr.handler.CSVLoader.doAdd(CSVRequestHandler.java:416)
>         at org.apache.solr.handler.SingleThreadedCSVLoader.addDoc(CSVRequestHandler.java:431)
>         at org.apache.solr.handler.CSVLoader.load(CSVRequestHandler.java:393)
>         at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:58)
>         at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
>         at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:244)
>         at org.apache.solr.core.SolrCore.execute(SolrCore.java:1376)
>         at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:365)
>         at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:260)
>         at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>         at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
>         at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
>         at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
>         at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
>         at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
>         at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
>         at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
>         at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
>         at org.mortbay.jetty.Server.handle(Server.java:326)
>         at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
>         at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945)
>         at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756)
>         at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
>         at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
>         at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
>         at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message