flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Flavio Pompermaier <pomperma...@okkam.it>
Subject Re: SolrCell help!
Date Tue, 23 Jul 2013 22:20:00 GMT
Unfortunately now I'm not at work..I'll try as soon as possible!

On Tue, Jul 23, 2013 at 7:48 PM, Wolfgang Hoschek <whoschek@cloudera.com>wrote:

> Seems like a transient mvn repo problem. Can you try again?
>
> Wolfgang.
>
> On Jul 23, 2013, at 1:36 AM, Flavio Pompermaier wrote:
>
> > Still problems when building CDK Data Core Module 0.4.2-SNAPSHOT. Maven
> hangs at:
> >
> > Downloading:
> https://repository.cloudera.com/artifactory/cloudera-repos/com/twitter/parquet-avro/1.0.0-SNAPSHOT/maven-metadata.xml
> > Downloading:
> https://oss.sonatype.org/content/repositories/snapshots/com/twitter/parquet-avro/1.0.0-SNAPSHOT/maven-metadata.xml
> > lug 23, 2013 10:35:41 AM
> org.apache.commons.httpclient.HttpMethodDirector executeWithRetry
> > INFO: I/O exception (java.net.ConnectException) caught when processing
> request: Connessione scaduta
> > lug 23, 2013 10:35:41 AM
> org.apache.commons.httpclient.HttpMethodDirector executeWithRetry
> > INFO: I/O exception (java.net.ConnectException) caught when processing
> request: Connessione scaduta
> > lug 23, 2013 10:35:41 AM
> org.apache.commons.httpclient.HttpMethodDirector executeWithRetry
> > INFO: Retrying request
> > lug 23, 2013 10:35:41 AM
> org.apache.commons.httpclient.HttpMethodDirector executeWithRetry
> > INFO: Retrying request
> >
> >
> >
> > On Tue, Jul 23, 2013 at 10:33 AM, Flavio Pompermaier <
> pompermaier@okkam.it> wrote:
> > Sorry, this is caused of our mirror..I remove it and I'll retry..
> >
> >
> > On Tue, Jul 23, 2013 at 10:31 AM, Flavio Pompermaier <
> pompermaier@okkam.it> wrote:
> >
> > I still get this error:
> >
> >  Failed to read artifact descriptor for
> commons-daemon:commons-daemon:jar:1.0.3: Could not transfer artifact
> commons-daemon:commons-daemon:pom:1.0.3 from/to repo (
> http://dev.okkam.it/artifactory/repo): Failed to transfer file:
> http://dev.okkam.it/artifactory/repo/commons-daemon/commons-daemon/1.0.3/commons-daemon-1.0.3.pom.
> Return code is: 409 -> [Help 1]
> >
> >
> > On Tue, Jul 23, 2013 at 10:22 AM, Wolfgang Hoschek <
> whoschek@cloudera.com> wrote:
> > Tests pass on java 6 but fail on java 7. Correspondingly, I have filed
> https://issues.cloudera.org/browse/CDK-80. We'll fix it. Meanwhile,
> please try java 6.
> >
> > Wolfgang.
> >
> > On Jul 23, 2013, at 12:51 AM, Flavio Pompermaier wrote:
> >
> > > I tried to download the current trunk but it doesn't compile..for
> example it hangs on
> > >
> https://repository.cloudera.com/artifactory/cloudera-repos/com/twitter/parquet-avro/1.0.0-SNAPSHOT/maven-metadata.xml
> > > that doesn't exists anymore..
> > >
> > >
> > > On Mon, Jul 22, 2013 at 11:14 PM, Flavio Pompermaier <
> pompermaier@okkam.it> wrote:
> > > You couldn't be more precise ;)
> > >
> > > Thanks,
> > > Flavio
> > >
> > > On Mon, Jul 22, 2013 at 11:02 PM, Wolfgang Hoschek <
> whoschek@cloudera.com> wrote:
> > > Docs for the xquery and xslt morphline commands are here (look for
> xquery"):
> https://github.com/cloudera/cdk/blob/master/cdk-morphlines/src/site/confluence/morphlinesReferenceGuide.confluence
> > >
> > > Example morphlines for the new xquery and xslt commands are here:
> https://github.com/cloudera/cdk/tree/master/cdk-morphlines/cdk-morphlines-saxon/src/test/resources/test-morphlines
> > >
> > > Sample input data is here:
> https://github.com/cloudera/cdk/tree/master/cdk-morphlines/cdk-morphlines-saxon/src/test/resources/test-documents
> > >
> > > Unit tests are here:
> https://github.com/cloudera/cdk/blob/master/cdk-morphlines/cdk-morphlines-saxon/src/test/java/com/cloudera/cdk/morphline/saxon/SaxonMorphlineTest.java
> > >
> > > Wolfgang.
> > >
> > > On Jul 22, 2013, at 1:41 PM, Flavio Pompermaier wrote:
> > >
> > > > Ok, I'll try to follow the code! Just one last thing: for
> morphine-neon I manage to find the test (in cdk repository) but for the new
> xslt and xquery I'm not able to find the tests code..could you give me an
> hook?
> > > >
> > > > On Mon, Jul 22, 2013 at 9:21 PM, Wolfgang Hoschek <
> whoschek@cloudera.com> wrote:
> > > > There are many tests for this in the morphlines repo.
> > > >
> > > > Wolfgang.
> > > >
> > > > On Jul 22, 2013, at 11:43 AM, Flavio Pompermaiert wrote:
> > > >
> > > > >
> > > > > Thank you for the great support Wolfgang!
> > > > > Flume + Morphlines is undoubtedly an exciting road but its taking
> me too much time :(
> > > > > Do you think you could add some more tests including readJson and
> the new xquery and xslt in trunk?
> > > > >
> > > > > Best,
> > > > > Flavio
> > > > > On Mon, Jul 22, 2013 at 8:12 PM, Wolfgang Hoschek <
> whoschek@cloudera.com> wrote:
> > > > > Looks like the DcXMLParser spits out a metadata field called
> "title" and another title as part of the Tika XML stream. That metadata
> field is then added to the solr document by solrcell. If you add "title" to
> the captures the title from the XML stream gets added as well by solrcell.
> > > > >
> > > > > JSON support has been released in morphlines-0.4.1 (which flume
> trunk is now depending on):
> http://cloudera.github.io/cdk/docs/0.4.1/cdk-morphlines/morphlinesReferenceGuide.html#readJson
> > > > >
> > > > > Note that Tika XML doesn't really support/capture XPath extraction
> with SolrCell. We have added proper support for reading, extracting and
> transforming XML and HTML with XPath, XQuery and XSLT on the current
> morphlines trunk (not yet released), similar to the way we already support
> JSON and Avro. This should make XML handling a lot more straightforward,
> and make the very limited XML SolrCell approach obsolete. Look for the new
> "xquery" and "xslt" command in
> https://github.com/cloudera/cdk/blob/master/cdk-morphlines/src/site/confluence/morphlinesReferenceGuide.confluence
> > > > >
> > > > > Meanwhile, consider using these new commands or, use JSON or Avro,
> or write your own custom morphline commands that extract whatever you want
> from your XML data.
> > > > >
> > > > > Wolfgang.
> > > > >
> > > > > On Jul 22, 2013, at 9:18 AM, Flavio Pompermaier wrote:
> > > > >
> > > > > > Hi to all,
> > > > > > I'm trying to understand how to "master" Morphline configuration
> files in order to put some data into Solr but I'm facing some problem with
> TestMorphlineSolrSink. This is what I done:
> > > > > >
> > > > > > 1) Since I want to index the title of the testXML.xml (i.e.
> "Tika test document") so I commented out all the parsers except
> org.apache.tika.parser.xml.DcXMLParser (which parse Doublin Core metadata)
> > > > > > 2) In schema.xml I added the following field:
> > > > > >     <field name="title" type="text_en" indexed="true"
> stored="true" multiValued="false" />
> > > > > >
> > > > > > But:
> > > > > >  - If I don't add anything to fmap or capture everything works
> fine but I don't understand why (who fills that field?). If instead I add
> to capture title or/and to famp title: title (or dc_title:title) Solr
> complains that 2 values are retrieved for 'title' (debugging the values I
> see the title and one empty value in the 'title\ metadata array...).
> > > > > > Thus, the problem is that everything works magically if the
> field is named title, but if I change its name to something like doc_title
> there's no way to make it non-multivalued.  Am I right? How can I fix this
> problem?
> > > > > > - I'd like to manage JSON files..How can I map JSON fields to
> Solr fields? Could someone give a simple example?
> > > > > >
> > > > > > Best,
> > > > > > Flavio
> > > > >
> > > > >
> > > > >
> > > >
> > >
> > >
> > >
> > >
> > >
> >
> >
> >
> >
> >
> >
> >
> > --
> >
> > Flavio Pompermaier
> > Development Department
> > _______________________________________________
> > OKKAMSrl - www.okkam.it
> >
> > Phone: +(39) 0461 283 702
> > Fax: + (39) 0461 186 6433
> > Email: f.pompermaier@okkam.it
> > Headquarters: Trento (Italy), fraz. Villazzano, Salita dei Molini 2
> > Registered office: Trento (Italy), via Segantini 23
> >
> > Confidentially notice. This e-mail transmission may contain legally
> privileged and/or confidential information. Please do not read it if you
> are not the intended recipient(S). Any use, distribution, reproduction or
> disclosure by any other person is strictly prohibited. If you have received
> this e-mail in error, please notify the sender and destroy the original
> transmission and its attachments without reading or saving it in any manner.
> >
> >
> >
> >
> >
>
>

Mime
View raw message