Return-Path: X-Original-To: apmail-nutch-dev-archive@www.apache.org Delivered-To: apmail-nutch-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D1BC8604B for ; Tue, 2 Aug 2011 21:36:45 +0000 (UTC) Received: (qmail 12600 invoked by uid 500); 2 Aug 2011 21:36:45 -0000 Delivered-To: apmail-nutch-dev-archive@nutch.apache.org Received: (qmail 12426 invoked by uid 500); 2 Aug 2011 21:36:44 -0000 Mailing-List: contact dev-help@nutch.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@nutch.apache.org Delivered-To: mailing list dev@nutch.apache.org Received: (qmail 12419 invoked by uid 99); 2 Aug 2011 21:36:44 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 02 Aug 2011 21:36:44 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of tdavidson@covario.com designates 173.227.41.150 as permitted sender) Received: from [173.227.41.150] (HELO mail.covario.com) (173.227.41.150) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 02 Aug 2011 21:36:39 +0000 From: Tom Davidson To: "dev@nutch.apache.org" Subject: RE: Nutch 2 and Cassandra Date: Tue, 2 Aug 2011 21:36:16 +0000 Message-ID: <8FC6939DDF1D1440A318713F1E4E94BC02593C@NAEXSAN01.semdirector.local> References: <8FC6939DDF1D1440A318713F1E4E94BC0256E3@NAEXSAN01.semdirector.local> <8FC6939DDF1D1440A318713F1E4E94BC02573A@NAEXSAN01.semdirector.local> <8FC6939DDF1D1440A318713F1E4E94BC025755@NAEXSAN01.semdirector.local> <8FC6939DDF1D1440A318713F1E4E94BC0258A9@NAEXSAN01.semdirector.local> In-Reply-To: Content-Language: en-US Content-Type: multipart/alternative; boundary="_000_8FC6939DDF1D1440A318713F1E4E94BC02593CNAEXSAN01semdirec_" MIME-Version: 1.0 X-WatchGuard-Spam-ID: str=0001.0A010208.4E386DE7.000B,ss=1,fgs=0 X-WatchGuard-Spam-Score: 0, clean; 0, no virus X-WatchGuard-Mail-Client-IP: 169.254.1.234 X-WatchGuard-Mail-From: tdavidson@covario.com --_000_8FC6939DDF1D1440A318713F1E4E94BC02593CNAEXSAN01semdirec_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable I did run into a couple more problems running Nutch 2 with CDH3. See https:= //issues.apache.org/jira/browse/NUTCH-937. I added a comment on the thread = explaining my additional problem. I worked around the problem by unjarring = the nutch-2-dev.job and seeting the HADOOP_CLASSPATH (see below) environmen= t variable. Not an ideal solution, but it works. In order to run Nutch 2 on CDH3 I added the following to nutch-site.xml and= rebuilt the nutch-2-dev.job: mapreduce.job.jar.unpack.pattern (?:classes/|lib/|plugins/).* plugin.folders ${job.local.dir}/../jars/plugins And I had to set this environment variable to my expanded plugins folder: export HADOOP_OPTS=3D"-Djob.local.dir=3D//nutch/plugins" From: lewis john mcgibbney [mailto:lewis.mcgibbney@gmail.com] Sent: Tuesday, August 02, 2011 2:00 PM To: dev@nutch.apache.org Subject: Re: Nutch 2 and Cassandra Hi I've been watching progress on this thread with interest and think that thi= s would be a great addition to the wiki under the following page [1] I am happy to write it up, however is there anything else we need to be awa= re of in addition to the material you have provided, for example some laten= t info that has been assumed or not been explained. Thank you [1] http://wiki.apache.org/nutch/ErrorMessagesInNutch2 On Tue, Aug 2, 2011 at 6:32 PM, Tom Davidson > wrote: I found the problem. I am using Cloudera CDH3 and it has a hue plugins jar = with an older thrift library in it. I removed the jar from my classpath and= all is good. Thanks for your help. -----Original Message----- From: Tom Davidson [mailto:tdavidson@covario.com] Sent: Monday, August 01, 2011 3:29 PM To: dev@nutch.apache.org Subject: RE: Nutch 2 and Cassandra OK... Are you running with a clustered version of Hadoop? I think you have = to have your HADOOP_HOME env variable set. Otherwise it runs in local mode.= I have been able to run in local mode, but not in deployed mode. -----Original Message----- From: Alexis [mailto:alexis.detreglode@gmail.com] Sent: Monday, August 01, 2011 3:25 PM To: dev@nutch.apache.org Subject: Re: Nutch 2 and Cassandra Ok this version of hector was properly resolved. Thanks! These are the logs: ~/java/workspace/Nutch/trunk/runtime/deploy$ bin/nutch inject ~/java/workspace/Nutch/seeds 11/08/01 15:17:45 INFO crawl.InjectorJob: InjectorJob: starting 11/08/01 15:17:45 INFO crawl.InjectorJob: InjectorJob: urlDir: /home/alex/java/workspace/Nutch/seeds 11/08/01 15:17:45 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=3DJobTracker, sessionId=3D 11/08/01 15:17:46 INFO connection.CassandraHostRetryService: Downed Host Retry service started with queue size -1 and retry delay 10s 11/08/01 15:17:46 INFO service.JmxMonitor: Registering JMX me.prettyprint.cassandra.service_Test Cluster:ServiceType=3Dhector,MonitorType=3Dhector 11/08/01 15:17:47 INFO store.CassandraClient: Keyspace 'webpage' in cluster 'Test Cluster' was created on host 'localhost' 11/08/01 15:17:48 INFO input.FileInputFormat: Total input paths to process = : 1 11/08/01 15:17:49 INFO mapred.JobClient: Running job: job_local_0001 11/08/01 15:17:49 INFO input.FileInputFormat: Total input paths to process = : 1 11/08/01 15:17:49 INFO mapreduce.GoraRecordWriter: gora.buffer.write.limit =3D 10000 11/08/01 15:17:49 INFO plugin.PluginRepository: Plugins: looking in: /tmp/hadoop-alex/hadoop-unjar8045717865743865180/plugins 11/08/01 15:17:49 INFO plugin.PluginRepository: Plugin Auto-activation mode: [true] 11/08/01 15:17:49 INFO plugin.PluginRepository: Registered Plugins: 11/08/01 15:17:49 INFO plugin.PluginRepository: the nutch core extension points (nutch-extensionpoints) 11/08/01 15:17:49 INFO plugin.PluginRepository: Basic URL Normalizer (urlnormalizer-basic) 11/08/01 15:17:49 INFO plugin.PluginRepository: Basic Indexing Filter (index-basic) 11/08/01 15:17:49 INFO plugin.PluginRepository: Html Parse Plug-in (parse-html) 11/08/01 15:17:49 INFO plugin.PluginRepository: HTTP Framework (lib-http) 11/08/01 15:17:49 INFO plugin.PluginRepository: Pass-through URL Normalizer (urlnormalizer-pass) 11/08/01 15:17:49 INFO plugin.PluginRepository: Regex URL Filter (urlfilter-regex) 11/08/01 15:17:49 INFO plugin.PluginRepository: Http Protocol Plug-in (protocol-http) 11/08/01 15:17:49 INFO plugin.PluginRepository: Regex URL Normalizer (urlnormalizer-regex) 11/08/01 15:17:49 INFO plugin.PluginRepository: Tika Parser Plug-in (parse-tika) 11/08/01 15:17:49 INFO plugin.PluginRepository: OPIC Scoring Plug-in (scoring-opic) 11/08/01 15:17:49 INFO plugin.PluginRepository: CyberNeko HTML Parser (lib-nekohtml) 11/08/01 15:17:49 INFO plugin.PluginRepository: Anchor Indexing Filter (index-anchor) 11/08/01 15:17:49 INFO plugin.PluginRepository: Regex URL Filter Framework (lib-regex-filter) 11/08/01 15:17:49 INFO plugin.PluginRepository: Registered Extension-Points= : 11/08/01 15:17:49 INFO plugin.PluginRepository: Nutch URL Normalizer (org.apache.nutch.net.URLNormalizer) 11/08/01 15:17:49 INFO plugin.PluginRepository: Nutch Protocol (org.apache.nutch.protocol.Protocol) 11/08/01 15:17:49 INFO plugin.PluginRepository: Parse Filter (org.apache.nutch.parse.ParseFilter) 11/08/01 15:17:49 INFO plugin.PluginRepository: Nutch URL Filter (org.apache.nutch.net.URLFilter) 11/08/01 15:17:49 INFO plugin.PluginRepository: Nutch Indexing Filter (org.apache.nutch.indexer.IndexingFilter) 11/08/01 15:17:49 INFO plugin.PluginRepository: Nutch Content Parser (org.apache.nutch.parse.Parser) 11/08/01 15:17:49 INFO plugin.PluginRepository: Nutch Scoring (org.apache.nutch.scoring.ScoringFilter) 11/08/01 15:17:50 INFO conf.Configuration: found resource regex-normalize.xml at file:/tmp/hadoop-alex/hadoop-unjar8045717865743865180/regex-normalize.xml 11/08/01 15:17:50 INFO conf.Configuration: found resource regex-urlfilter.txt at file:/tmp/hadoop-alex/hadoop-unjar8045717865743865180/regex-urlfilter.txt 11/08/01 15:17:50 INFO regex.RegexURLNormalizer: can't find rules for scope 'inject', using default 11/08/01 15:17:50 INFO mapred.JobClient: map 0% reduce 0% 11/08/01 15:17:51 INFO mapred.TaskRunner: Task:attempt_local_0001_m_000000_0 is done. And is in the process of commiting 11/08/01 15:17:51 INFO mapred.LocalJobRunner: 11/08/01 15:17:51 INFO mapred.TaskRunner: Task 'attempt_local_0001_m_000000_0' done. 11/08/01 15:17:52 INFO mapred.JobClient: map 100% reduce 0% 11/08/01 15:17:52 INFO mapred.JobClient: Job complete: job_local_0001 11/08/01 15:17:52 INFO mapred.JobClient: Counters: 5 11/08/01 15:17:52 INFO mapred.JobClient: FileSystemCounters 11/08/01 15:17:52 INFO mapred.JobClient: FILE_BYTES_READ=3D44872735 11/08/01 15:17:52 INFO mapred.JobClient: FILE_BYTES_WRITTEN=3D45245279 11/08/01 15:17:52 INFO mapred.JobClient: Map-Reduce Framework 11/08/01 15:17:52 INFO mapred.JobClient: Map input records=3D3 11/08/01 15:17:52 INFO mapred.JobClient: Spilled Records=3D0 11/08/01 15:17:52 INFO mapred.JobClient: Map output records=3D3 11/08/01 15:17:52 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with processName=3DJobTracker, sessionId=3D - already initialized 11/08/01 15:17:52 INFO crawl.InjectorJob: InjectorJob: finished This is what was added to ivy/ivy.xml: + compile"/> + + *,!javadoc,!sources"/> + *,!javadoc,!sources"/> + *,!javadoc,!sources"/> + *,!javadoc,!sources"/> + + On Mon, Aug 1, 2011 at 2:55 PM, Tom Davidson > wrote: > I did something similar to below to add the Cassandra dependencies. Note = that I am getting NoSuchMethodErrors not ClassNotFoundExceptions. Can you a= dd the hector jars to your nutch job jar and see what you get? I think I am= one step ahead of you. BTW, I just added this line to get the hector depen= dency: > > default"/> > > -----Original Message----- > From: Alexis [mailto:alexis.detreglode@gmail.com] > Sent: Monday, August 01, 2011 2:28 PM > To: dev@nutch.apache.org > Subject: Re: Nutch 2 and Cassandra > > Hi, libthrift is a dependency of cassandra-thrift, as listed here: > http://mvnrepository.com/artifact/org.apache.cassandra/cassandra-thrift/0= .8.1 > > During Nutch build, you have to manually tweak the Ivy configuration depe= nding on your choice of the Gora store, in this case Cassandra. > Basically you need to add all the dependencies listed there: > http://svn.apache.org/viewvc/incubator/gora/trunk/gora-cassandra/ivy/ivy.= xml?view=3Dmarkup > > Let's try to add to $NUTCH_HOME/ivy/ivy.xml the following dependencies an= d then let's rebuild Nutch (see attached patch): > rev=3D"0.2-incubating" conf=3D"*->compile"/> > > conf=3D"*->*,!javadoc,!sources"/> > name=3D"high-scale-lib" rev=3D"1.1.2" conf=3D"*->*,!javadoc,!sources"/> > rev=3D"1.0" conf=3D"*->*,!javadoc,!sources"/> > conf=3D"*->*,!javadoc,!sources"/> > > $ ant clean > $ ant > > In your case libthrift should now be downloaded by Ivy and then bundled i= nto the nutch-2.0-dev.job file. I'm not sure how apache-cassandra and hecto= r got included in your classpath... > > Somehow we need to resolve as well: > rev=3D"0.8.1"/> > > > I don't think the following 2 jars are in the default maven repository so= they won't be downloaded, that's why they were commented in the Gora Cassa= ndra Ivy config (gora/trunk/gora-cassandra/ivy/ivy.xml) > > > Since hector jar is not found in my case I get: > ~/java/workspace/Nutch/trunk/runtime/deploy$ bin/nutch inject ~/java/work= space/Nutch/seeds > 11/08/01 14:18:42 INFO crawl.InjectorJob: InjectorJob: starting > 11/08/01 14:18:42 INFO crawl.InjectorJob: InjectorJob: urlDir: > /home/alex/java/workspace/Nutch/seeds > 11/08/01 14:18:42 INFO security.Groups: Group mapping impl=3Dorg.apache.h= adoop.security.ShellBasedUnixGroupsMapping; > cacheTimeout=3D300000 > 11/08/01 14:18:42 INFO jvm.JvmMetrics: Initializing JVM Metrics with proc= essName=3DJobTracker, sessionId=3D > 11/08/01 14:18:42 ERROR crawl.InjectorJob: InjectorJob: > org.apache.gora.util.GoraException: > java.lang.reflect.InvocationTargetException > at org.apache.gora.store.DataStoreFactory.createDataStore(DataStor= eFactory.java:110) > at org.apache.gora.store.DataStoreFactory.createDataStore(DataStor= eFactory.java:93) > at org.apache.nutch.storage.StorageUtils.createWebStore(StorageUti= ls.java:59) > at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:243) > at org.apache.nutch.crawl.InjectorJob.inject(InjectorJob.java:268) > at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:282) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69) > at org.apache.nutch.crawl.InjectorJob.main(InjectorJob.java:292) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccesso= rImpl.java:39) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMetho= dAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.util.RunJar.main(RunJar.java:192) > Caused by: java.lang.reflect.InvocationTargetException > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native M= ethod) > at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeCon= structorAccessorImpl.java:39) > at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Deleg= atingConstructorAccessorImpl.java:27) > at java.lang.reflect.Constructor.newInstance(Constructor.java:513) > at org.apache.gora.util.ReflectionUtils.newInstance(ReflectionUtil= s.java:76) > at org.apache.gora.store.DataStoreFactory.createDataStore(DataStor= eFactory.java:102) > ... 12 more > Caused by: java.lang.NoClassDefFoundError: me/prettyprint/hector/api/Seri= alizer > at org.apache.gora.cassandra.store.CassandraStore.(Cassandra= Store.java:60) > ... 18 more > Caused by: java.lang.ClassNotFoundException: > me.prettyprint.hector.api.Serializer > at java.net.URLClassLoader$1.run(URLClassLoader.java:202) > at java.security.AccessController.doPrivileged(Native Method) > at java.net.URLClassLoader.findClass(URLClassLoader.java:190) > at java.lang.ClassLoader.loadClass(ClassLoader.java:306) > at java.lang.ClassLoader.loadClass(ClassLoader.java:247) > ... 19 more > > > > > On Mon, Aug 1, 2011 at 11:59 AM, Tom Davidson > wrote: >> Hi All, >> >> >> >> I am kind of at my wit's end here, so I am hoping someone here can >> help. I am trying to use Nutch2 and Cassandra and I have been >> successful using the runtime/local build. I am using the Cloudera CDH3 >> on CentOs 5 and I do not want to contaminate by hadoop install by >> dropping in a bunch of Nutch jars, etc. So I am trying to use the >> nutch-2-dev.job jar. When I try to use the nutch2-dev.job jar, I get >> the error below. I have double and triple checked the classpath and >> the included jars and the only jar that contains FieldValueMetaData is >> the libthrift-0.6.1.jar which has the method that is claimed to be missi= ng. Any ideas? >> >> >> >> Thanks, >> >> Tom >> >> >> >> >> >> >> >> >> >> [tdavidson@nadevsan06 ~]$ bin/nutch inject urls >> >> /opt/jdk1.6.0_21/bin/java -Dproc_jar -Xmx1000m >> -Dhadoop.log.dir=3D/usr/lib/hadoop-0.20/logs >> -Dhadoop.log.file=3Dhadoop.log -Dhadoop.home.dir=3D/usr/lib/hadoop-0.20 >> -Dhadoop.id.str=3Dtdavidson -Dhadoop.root.logger=3DINFO,console >> -Djava.library.path=3D/usr/lib/hadoop-0.20/lib/native/Linux-amd64-64 >> -Dhadoop.policy.file=3Dhadoop-policy.xml -classpath >> /usr/lib/hadoop-0.20/conf:/opt/jdk1.6.0_21/lib/tools.jar:/usr/lib/hado >> op-0.20:/usr/lib/hadoop-0.20/hadoop-core-0.20.2-cdh3u1.jar:/usr/lib/ha >> doop-0.20/lib/ant-contrib-1.0b3.jar:/usr/lib/hadoop-0.20/lib/aspectjrt >> -1.6.5.jar:/usr/lib/hadoop-0.20/lib/aspectjtools-1.6.5.jar:/usr/lib/ha >> doop-0.20/lib/commons-cli-1.2.jar:/usr/lib/hadoop-0.20/lib/commons-cod >> ec-1.4.jar:/usr/lib/hadoop-0.20/lib/commons-daemon-1.0.1.jar:/usr/lib/ >> hadoop-0.20/lib/commons-el-1.0.jar:/usr/lib/hadoop-0.20/lib/commons-ht >> tpclient-3.0.1.jar:/usr/lib/hadoop-0.20/lib/commons-logging-1.0.4.jar: >> /usr/lib/hadoop-0.20/lib/commons-logging-api-1.0.4.jar:/usr/lib/hadoop >> -0.20/lib/commons-net-1.4.1.jar:/usr/lib/hadoop-0.20/lib/core-3.1.1.ja >> r:/usr/lib/hadoop-0.20/lib/hadoop-fairscheduler-0.20.2-cdh3u1.jar:/usr >> /lib/hadoop-0.20/lib/hsqldb-1.8.0.10.jar:/usr/lib/hadoop-0.20/lib/hue- >> plugins-1.2.0-cdh3u1.jar:/usr/lib/hadoop-0.20/lib/jackson-core-asl-1.5 >> .2.jar:/usr/lib/hadoop-0.20/lib/jackson-mapper-asl-1.5.2.jar:/usr/lib/ >> hadoop-0.20/lib/jasper-compiler-5.5.12.jar:/usr/lib/hadoop-0.20/lib/ja >> sper-runtime-5.5.12.jar:/usr/lib/hadoop-0.20/lib/jets3t-0.6.1.jar:/usr >> /lib/hadoop-0.20/lib/jetty-6.1.26.jar:/usr/lib/hadoop-0.20/lib/jetty-s >> ervlet-tester-6.1.26.jar:/usr/lib/hadoop-0.20/lib/jetty-util-6.1.26.ja >> r:/usr/lib/hadoop-0.20/lib/jsch-0.1.42.jar:/usr/lib/hadoop-0.20/lib/ju >> nit-4.5.jar:/usr/lib/hadoop-0.20/lib/kfs-0.2.2.jar:/usr/lib/hadoop-0.2 >> 0/lib/log4j-1.2.15.jar:/usr/lib/hadoop-0.20/lib/mockito-all-1.8.2.jar: >> /usr/lib/hadoop-0.20/lib/oro-2.0.8.jar:/usr/lib/hadoop-0.20/lib/servle >> t-api-2.5-20081211.jar:/usr/lib/hadoop-0.20/lib/servlet-api-2.5-6.1.14 >> .jar:/usr/lib/hadoop-0.20/lib/slf4j-api-1.4.3.jar:/usr/lib/hadoop-0.20 >> /lib/slf4j-log4j12-1.4.3.jar:/usr/lib/hadoop-0.20/lib/xmlenc-0.52.jar: >> /usr/lib/hadoop-0.20/lib/jsp-2.1/jsp-2.1.jar:/usr/lib/hadoop-0.20/lib/ >> jsp-2.1/jsp-api-2.1.jar org.apache.hadoop.util.RunJar >> /home/SEMDIRECTOR/tdavidson/nutch-2.job >> org.apache.nutch.crawl.InjectorJob urls >> >> 11/08/01 11:51:54 INFO crawl.InjectorJob: InjectorJob: starting >> >> 11/08/01 11:51:54 INFO crawl.InjectorJob: InjectorJob: urlDir: urls >> >> 11/08/01 11:51:55 INFO connection.CassandraHostRetryService: Downed >> Host Retry service started with queue size -1 and retry delay 10s >> >> 11/08/01 11:51:55 INFO service.JmxMonitor: Registering JMX >> me.prettyprint.cassandra.service_Test >> Cluster:ServiceType=3Dhector,MonitorType=3Dhector >> >> 11/08/01 11:51:55 ERROR crawl.InjectorJob: InjectorJob: >> org.apache.gora.util.GoraException: >> java.lang.reflect.InvocationTargetException >> >> at >> org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactor >> y.java:110) >> >> at >> org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactor >> y.java:93) >> >> at >> org.apache.nutch.storage.StorageUtils.createWebStore(StorageUtils.java >> :59) >> >> at >> org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:243) >> >> at >> org.apache.nutch.crawl.InjectorJob.inject(InjectorJob.java:268) >> >> at >> org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:282) >> >> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) >> >> at >> org.apache.nutch.crawl.InjectorJob.main(InjectorJob.java:292) >> >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> >> at >> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.j >> ava:39) >> >> at >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccess >> orImpl.java:25) >> >> at java.lang.reflect.Method.invoke(Method.java:597) >> >> at org.apache.hadoop.util.RunJar.main(RunJar.java:186) >> >> Caused by: java.lang.reflect.InvocationTargetException >> >> at >> sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native >> Method) >> >> at >> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructo >> rAccessorImpl.java:39) >> >> at >> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingCo >> nstructorAccessorImpl.java:27) >> >> at >> java.lang.reflect.Constructor.newInstance(Constructor.java:513) >> >> at >> org.apache.gora.util.ReflectionUtils.newInstance(ReflectionUtils.java: >> 76) >> >> at >> org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactor >> y.java:102) >> >> ... 12 more >> >> Caused by: java.lang.NoSuchMethodError: >> org.apache.thrift.meta_data.FieldValueMetaData.(BZ)V >> >> at org.apache.cassandra.thrift.CfDef.(CfDef.java:299) >> >> at org.apache.cassandra.thrift.KsDef.read(KsDef.java:753) >> >> at >> org.apache.cassandra.thrift.Cassandra$describe_keyspace_result.read(Ca >> ssandra.java:24338) >> >> at >> org.apache.cassandra.thrift.Cassandra$Client.recv_describe_keyspace(Ca >> ssandra.java:1371) >> >> at >> org.apache.cassandra.thrift.Cassandra$Client.describe_keyspace(Cassand >> ra.java:1346) >> >> at >> me.prettyprint.cassandra.service.AbstractCluster$4.execute(AbstractClu >> ster.java:192) >> >> at >> me.prettyprint.cassandra.service.AbstractCluster$4.execute(AbstractClu >> ster.java:187) >> >> at >> me.prettyprint.cassandra.service.Operation.executeAndSetResult(Operati >> on.java:101) >> >> at >> me.prettyprint.cassandra.connection.HConnectionManager.operateWithFail >> over(HConnectionManager.java:232) >> >> at >> me.prettyprint.cassandra.service.AbstractCluster.describeKeyspace(Abst >> ractCluster.java:201) >> >> at >> org.apache.gora.cassandra.store.CassandraClient.checkKeyspace(Cassandr >> aClient.java:82) >> >> at >> org.apache.gora.cassandra.store.CassandraClient.init(CassandraClient.j >> ava:69) >> >> at >> org.apache.gora.cassandra.store.CassandraStore.(CassandraStore.j >> ava:68) >> >> ... 18 more > -- Lewis --_000_8FC6939DDF1D1440A318713F1E4E94BC02593CNAEXSAN01semdirec_ Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: quoted-printable

I did run into a couple m= ore problems running Nutch 2 with CDH3. See https://= issues.apache.org/jira/browse/NUTCH-937. I added a comment on the threa= d explaining my additional problem. I worked around the problem by unjarrin= g the nutch-2-dev.job and seeting the HADOOP_CLASSPATH (see below) environment variable. Not an ideal soluti= on, but it works.

 

In order to run Nutch 2 on CDH3 I added the followin= g to nutch-site.xml and rebuilt the nutch-2-dev.job:

 

    <property><= /p>

        <name&= gt;mapreduce.job.jar.unpack.pattern</name>

        <value= >(?:classes/|lib/|plugins/).*</value>

    </property>

 

    <property>

        <name&= gt;plugin.folders</name>

        <value= >${job.local.dir}/../jars/plugins</value>

    </property>

 

And I had to set this environment variable to my exp= anded plugins folder:

 

export HADOOP_OPTS=3D&quo= t;-Djob.local.dir=3D/<MY HOME>/nutch/plugins"<= /p>

 <= /p>

 <= /p>

 <= /p>

 <= /p>

 <= /p>

From: lewis jo= hn mcgibbney [mailto:lewis.mcgibbney@gmail.com]
Sent: Tuesday, August 02, 2011 2:00 PM
To: dev@nutch.apache.org
Subject: Re: Nutch 2 and Cassandra

 

Hi

I've been watching progress on this thread with interest and think that thi= s would be a great addition to the wiki under the following page [1]

I am happy to write it up, however is there anything else we need to be awa= re of in addition to the material you have provided, for example some laten= t info that has been assumed or not been explained.

Thank you

[1] http://w= iki.apache.org/nutch/ErrorMessagesInNutch2

On Tue, Aug 2, 2011 at 6:32 PM, Tom Davidson <tdavidson@covario.com> wrote:<= o:p>

I found the problem. I am using Cloudera CDH3 and it= has a hue plugins jar with an older thrift library in it. I removed the ja= r from my classpath and all is good. Thanks for your help.


-----Original Message-----
From: Tom Davidson [mailto:tdavids= on@covario.com]
Sent: Monday, August 01, 2011 3:29 PM
To: dev@nutch.apache.org

Subject: RE: Nutch 2 and Cassandra

OK... Are you running with a clustered version of Hadoop? I think you have = to have your HADOOP_HOME env variable set. Otherwise it runs in local mode.= I have been able to run in local mode, but not in deployed mode.


-----Original Message-----
From: Alexis [mailto:alexis.= detreglode@gmail.com]
Sent: Monday, August 01, 2011 3:25 PM
To: dev@nutch.apache.org
Subject: Re: Nutch 2 and Cassandra

Ok this version of hector was properly resolved. Thanks!

These are the logs:
~/java/workspace/Nutch/trunk/runtime/deploy$ bin/nutch inject
~/java/workspace/Nutch/seeds
11/08/01 15:17:45 INFO crawl.InjectorJob: InjectorJob: starting
11/08/01 15:17:45 INFO crawl.InjectorJob: InjectorJob: urlDir:
/home/alex/java/workspace/Nutch/seeds
11/08/01 15:17:45 INFO jvm.JvmMetrics: Initializing JVM Metrics with
processName=3DJobTracker, sessionId=3D
11/08/01 15:17:46 INFO connection.CassandraHostRetryService: Downed
Host Retry service started with queue size -1 and retry delay 10s
11/08/01 15:17:46 INFO service.JmxMonitor: Registering JMX
me.prettyprint.cassandra.service_Test
Cluster:ServiceType=3Dhector,MonitorType=3Dhector
11/08/01 15:17:47 INFO store.CassandraClient: Keyspace 'webpage' in
cluster 'Test Cluster' was created on host 'localhost'
11/08/01 15:17:48 INFO input.FileInputFormat: Total input paths to process = : 1
11/08/01 15:17:49 INFO mapred.JobClient: Running job: job_local_0001
11/08/01 15:17:49 INFO input.FileInputFormat: Total input paths to process = : 1
11/08/01 15:17:49 INFO mapreduce.GoraRecordWriter:
gora.buffer.write.limit =3D 10000
11/08/01 15:17:49 INFO plugin.PluginRepository: Plugins: looking in:
/tmp/hadoop-alex/hadoop-unjar8045717865743865180/plugins
11/08/01 15:17:49 INFO plugin.PluginRepository: Plugin Auto-activation
mode: [true]
11/08/01 15:17:49 INFO plugin.PluginRepository: Registered Plugins:
11/08/01 15:17:49 INFO plugin.PluginRepository:        = the nutch core
extension points (nutch-extensionpoints)
11/08/01 15:17:49 INFO plugin.PluginRepository:        = Basic URL
Normalizer (urlnormalizer-basic)
11/08/01 15:17:49 INFO plugin.PluginRepository:        = Basic Indexing
Filter (index-basic)
11/08/01 15:17:49 INFO plugin.PluginRepository:        = Html Parse
Plug-in (parse-html)
11/08/01 15:17:49 INFO plugin.PluginRepository:        = HTTP Framework
(lib-http)
11/08/01 15:17:49 INFO plugin.PluginRepository:        = Pass-through
URL Normalizer (urlnormalizer-pass)
11/08/01 15:17:49 INFO plugin.PluginRepository:        = Regex URL
Filter (urlfilter-regex)
11/08/01 15:17:49 INFO plugin.PluginRepository:        = Http Protocol
Plug-in (protocol-http)
11/08/01 15:17:49 INFO plugin.PluginRepository:        = Regex URL
Normalizer (urlnormalizer-regex)
11/08/01 15:17:49 INFO plugin.PluginRepository:        = Tika Parser
Plug-in (parse-tika)
11/08/01 15:17:49 INFO plugin.PluginRepository:        = OPIC Scoring
Plug-in (scoring-opic)
11/08/01 15:17:49 INFO plugin.PluginRepository:        = CyberNeko HTML
Parser (lib-nekohtml)
11/08/01 15:17:49 INFO plugin.PluginRepository:        = Anchor
Indexing Filter (index-anchor)
11/08/01 15:17:49 INFO plugin.PluginRepository:        = Regex URL
Filter Framework (lib-regex-filter)
11/08/01 15:17:49 INFO plugin.PluginRepository: Registered Extension-Points= :
11/08/01 15:17:49 INFO plugin.PluginRepository:        = Nutch URL
Normalizer (org.apache.nutch.net.URLNormalizer)
11/08/01 15:17:49 INFO plugin.PluginRepository:        = Nutch Protocol
(org.apache.nutch.protocol.Protocol)
11/08/01 15:17:49 INFO plugin.PluginRepository:        = Parse Filter
(org.apache.nutch.parse.ParseFilter)
11/08/01 15:17:49 INFO plugin.PluginRepository:        = Nutch URL
Filter (org.apache.nutch.net.URLFilter)
11/08/01 15:17:49 INFO plugin.PluginRepository:        = Nutch Indexing
Filter (org.apache.nutch.indexer.IndexingFilter)
11/08/01 15:17:49 INFO plugin.PluginRepository:        = Nutch Content
Parser (org.apache.nutch.parse.Parser)
11/08/01 15:17:49 INFO plugin.PluginRepository:        = Nutch Scoring
(org.apache.nutch.scoring.ScoringFilter)
11/08/01 15:17:50 INFO conf.Configuration: found resource
regex-normalize.xml at
file:/tmp/hadoop-alex/hadoop-unjar8045717865743865180/regex-normalize.xml 11/08/01 15:17:50 INFO conf.Configuration: found resource
regex-urlfilter.txt at
file:/tmp/hadoop-alex/hadoop-unjar8045717865743865180/regex-urlfilter.txt 11/08/01 15:17:50 INFO regex.RegexURLNormalizer: can't find rules for
scope 'inject', using default
11/08/01 15:17:50 INFO mapred.JobClient:  map 0% reduce 0%
11/08/01 15:17:51 INFO mapred.TaskRunner:
Task:attempt_local_0001_m_000000_0 is done. And is in the process of
commiting
11/08/01 15:17:51 INFO mapred.LocalJobRunner:
11/08/01 15:17:51 INFO mapred.TaskRunner: Task
'attempt_local_0001_m_000000_0' done.
11/08/01 15:17:52 INFO mapred.JobClient:  map 100% reduce 0%
11/08/01 15:17:52 INFO mapred.JobClient: Job complete: job_local_0001
11/08/01 15:17:52 INFO mapred.JobClient: Counters: 5
11/08/01 15:17:52 INFO mapred.JobClient:   FileSystemCounters
11/08/01 15:17:52 INFO mapred.JobClient:     FILE_BYTES_READ=3D44= 872735
11/08/01 15:17:52 INFO mapred.JobClient:     FILE_BYTES_WRITTEN= =3D45245279
11/08/01 15:17:52 INFO mapred.JobClient:   Map-Reduce Framework
11/08/01 15:17:52 INFO mapred.JobClient:     Map input records=3D= 3
11/08/01 15:17:52 INFO mapred.JobClient:     Spilled Records=3D0<= br> 11/08/01 15:17:52 INFO mapred.JobClient:     Map output records= =3D3
11/08/01 15:17:52 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics
with processName=3DJobTracker, sessionId=3D - already initialized
11/08/01 15:17:52 INFO crawl.InjectorJob: InjectorJob: finished



This is what was added to ivy/ivy.xml:

+       <dependency org=3D"org.apache.gora"= name=3D"gora-cassandra"
rev=3D"0.2-incubating" conf=3D"*->compile"/>
+       <dependency org=3D"org.apache.cassandra&= quot; name=3D"cassandra-thrift"
rev=3D"0.8.1"/>
+       <dependency org=3D"com.ecyrd.speed4j&quo= t; name=3D"speed4j" rev=3D"0.9"
conf=3D"*->*,!javadoc,!sources"/>
+       <dependency org=3D"com.github.stephenc.h= igh-scale-lib"
name=3D"high-scale-lib" rev=3D"1.1.2" conf=3D"*-&g= t;*,!javadoc,!sources"/>
+       <dependency org=3D"com.google.collection= s"
name=3D"google-collections" rev=3D"1.0" conf=3D"*-= >*,!javadoc,!sources"/>
+       <dependency org=3D"com.google.guava"= ; name=3D"guava" rev=3D"r09"
conf=3D"*->*,!javadoc,!sources"/>
+       <dependency org=3D"org.apache.cassandra&= quot; name=3D"apache-cassandra"
rev=3D"0.8.1"/>
+       <dependency org=3D"me.prettyprint" = name=3D"hector-core" rev=3D"0.8.0-2"/>



On Mon, Aug 1, 2011 at 2:55 PM, Tom Davidson <tdavidson@covario.com> wrote:
> I did something similar to below to add the Cassandra dependencies. No= te that I am getting NoSuchMethodErrors not ClassNotFoundExceptions. Can yo= u add the hector jars to your nutch job jar and see what you get? I think I= am one step ahead of you. BTW, I just added this line to get the hector dependency:
>
>        <dependency org=3D"me.prettyprint&q= uot; name=3D"hector-core" rev=3D"0.8.0-2" conf=3D"= *->default"/>
>
> -----Original Message-----
> From: Alexis [mailto:al= exis.detreglode@gmail.com]
> Sent: Monday, August 01, 2011 2:28 PM
> To: dev@nutch.apache.org > Subject: Re: Nutch 2 and Cassandra
>
> Hi, libthrift is a dependency of cassandra-thrift, as listed here:
> http://mvnrepository.com/artifact/org.apache.cassandra/cassandra-thrift/0.8= .1
>
> During Nutch build, you have to manually tweak the Ivy configuration d= epending on your choice of the Gora store, in this case Cassandra.
> Basically you need to add all the dependencies listed there:
> http://svn.apache.org/viewvc/incubator/gora/trunk/gora-cassandra/ivy/ivy.xm= l?view=3Dmarkup
>
> Let's try to add to $NUTCH_HOME/ivy/ivy.xml the following dependencies= and then let's rebuild Nutch (see attached patch):
>        <dependency org=3D"org.apache.gora&= quot; name=3D"gora-cassandra"
> rev=3D"0.2-incubating" conf=3D"*->compile"/>=
>        <dependency org=3D"org.apache.cassa= ndra" name=3D"cassandra-thrift" rev=3D"0.8.1"/>=
>        <dependency org=3D"com.ecyrd.speed4= j" name=3D"speed4j" rev=3D"0.9"
> conf=3D"*->*,!javadoc,!sources"/>
>        <dependency org=3D"com.github.steph= enc.high-scale-lib"
> name=3D"high-scale-lib" rev=3D"1.1.2" conf=3D"= ;*->*,!javadoc,!sources"/>
>        <dependency org=3D"com.google.colle= ctions" name=3D"google-collections"
> rev=3D"1.0" conf=3D"*->*,!javadoc,!sources"/>= ;
>        <dependency org=3D"com.google.guava= " name=3D"guava" rev=3D"r09"
> conf=3D"*->*,!javadoc,!sources"/>
>
> $ ant clean
> $ ant
>
> In your case libthrift should now be downloaded by Ivy and then bundle= d into the nutch-2.0-dev.job file. I'm not sure how apache-cassandra and he= ctor got included in your classpath...
>
> Somehow we need to resolve as well:
>        <dependency org=3D"org.apache.cassa= ndra" name=3D"apache-cassandra"
> rev=3D"0.8.1"/>
>        <dependency org=3D"me.prettyprint&q= uot; name=3D"hector" rev=3D"0.8.0-1"/>
>
> I don't think the following 2 jars are in the default maven repository= so they won't be downloaded, that's why they were commented in the Gora Ca= ssandra Ivy config (gora/trunk/gora-cassandra/ivy/ivy.xml)
>
>
> Since hector jar is not found in my case I get:
> ~/java/workspace/Nutch/trunk/runtime/deploy$ bin/nutch inject ~/java/w= orkspace/Nutch/seeds
> 11/08/01 14:18:42 INFO crawl.InjectorJob: InjectorJob: starting
> 11/08/01 14:18:42 INFO crawl.InjectorJob: InjectorJob: urlDir:
> /home/alex/java/workspace/Nutch/seeds
> 11/08/01 14:18:42 INFO security.Groups: Group mapping impl=3Dorg.apach= e.hadoop.security.ShellBasedUnixGroupsMapping;
> cacheTimeout=3D300000
> 11/08/01 14:18:42 INFO jvm.JvmMetrics: Initializing JVM Metrics with p= rocessName=3DJobTracker, sessionId=3D
> 11/08/01 14:18:42 ERROR crawl.InjectorJob: InjectorJob:
> org.apache.gora.util.GoraException:
> java.lang.reflect.InvocationTargetException
>        at org.apache.gora.store.DataStoreFactory.c= reateDataStore(DataStoreFactory.java:110)
>        at org.apache.gora.store.DataStoreFactory.c= reateDataStore(DataStoreFactory.java:93)
>        at org.apache.nutch.storage.StorageUtils.cr= eateWebStore(StorageUtils.java:59)
>        at org.apache.nutch.crawl.InjectorJob.run(I= njectorJob.java:243)
>        at org.apache.nutch.crawl.InjectorJob.injec= t(InjectorJob.java:268)
>        at org.apache.nutch.crawl.InjectorJob.run(I= njectorJob.java:282)
>        at org.apache.hadoop.util.ToolRunner.run(To= olRunner.java:69)
>        at org.apache.nutch.crawl.InjectorJob.main(= InjectorJob.java:292)
>        at sun.reflect.NativeMethodAccessorImpl.inv= oke0(Native Method)
>        at sun.reflect.NativeMethodAccessorImpl.inv= oke(NativeMethodAccessorImpl.java:39)
>        at sun.reflect.DelegatingMethodAccessorImpl= .invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.j= ava:597)
>        at org.apache.hadoop.util.RunJar.main(RunJa= r.java:192)
> Caused by: java.lang.reflect.InvocationTargetException
>        at sun.reflect.NativeConstructorAccessorImp= l.newInstance0(Native Method)
>        at sun.reflect.NativeConstructorAccessorImp= l.newInstance(NativeConstructorAccessorImpl.java:39)
>        at sun.reflect.DelegatingConstructorAccesso= rImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
>        at java.lang.reflect.Constructor.newInstanc= e(Constructor.java:513)
>        at org.apache.gora.util.ReflectionUtils.new= Instance(ReflectionUtils.java:76)
>        at org.apache.gora.store.DataStoreFactory.c= reateDataStore(DataStoreFactory.java:102)
>        ... 12 more
> Caused by: java.lang.NoClassDefFoundError: me/prettyprint/hector/api/S= erializer
>        at org.apache.gora.cassandra.store.Cassandr= aStore.<init>(CassandraStore.java:60)
>        ... 18 more
> Caused by: java.lang.ClassNotFoundException:
> me.prettyprint.hector.api.Serializer
>        at java.net.URLClassLoader$1.run(URLClassLo= ader.java:202)
>        at java.security.AccessController.doPrivile= ged(Native Method)
>        at java.net.URLClassLoader.findClass(URLCla= ssLoader.java:190)
>        at java.lang.ClassLoader.loadClass(ClassLoa= der.java:306)
>        at java.lang.ClassLoader.loadClass(ClassLoa= der.java:247)
>        ... 19 more
>
>
>
>
> On Mon, Aug 1, 2011 at 11:59 AM, Tom Davidson <tdavidson@covario.com> wrote:
>> Hi All,
>>
>>
>>
>> I am kind of at my wit's end here, so I am hoping someone here can=
>> help.  I am trying to use Nutch2 and Cassandra and I have bee= n
>> successful using the runtime/local build. I am using the Cloudera = CDH3
>> on CentOs 5 and I do not want to contaminate by hadoop install by<= br> >> dropping in a bunch of Nutch jars, etc. So I am trying to use the<= br> >> nutch-2-dev.job jar. When I try to use the nutch2-dev.job jar, I g= et
>> the error below.  I have double and triple checked the classp= ath and
>> the included jars and the only jar that contains FieldValueMetaDat= a is
>> the libthrift-0.6.1.jar which has the method that is claimed to be= missing. Any ideas?
>>
>>
>>
>> Thanks,
>>
>> Tom
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> [tdavidson@nadevsan06 ~]$ bin/nutch inject urls
>>
>> /opt/jdk1.6.0_21/bin/java -Dproc_jar -Xmx1000m
>> -Dhadoop.log.dir=3D/usr/lib/hadoop-0.20/logs
>> -Dhadoop.log.file=3Dhadoop.log -Dhadoop.home.dir=3D/usr/lib/hadoop= -0.20
>> -Dhadoop.id.str=3Dtdavidson -Dhadoop.root.logger=3DINFO,console >> -Djava.library.path=3D/usr/lib/hadoop-0.20/lib/native/Linux-amd64-= 64
>> -Dhadoop.policy.file=3Dhadoop-policy.xml -classpath
>> /usr/lib/hadoop-0.20/conf:/opt/jdk1.6.0_21/lib/tools.jar:/usr/lib/= hado
>> op-0.20:/usr/lib/hadoop-0.20/hadoop-core-0.20.2-cdh3u1.jar:/usr/li= b/ha
>> doop-0.20/lib/ant-contrib-1.0b3.jar:/usr/lib/hadoop-0.20/lib/aspec= tjrt
>> -1.6.5.jar:/usr/lib/hadoop-0.20/lib/aspectjtools-1.6.5.jar:/usr/li= b/ha
>> doop-0.20/lib/commons-cli-1.2.jar:/usr/lib/hadoop-0.20/lib/commons= -cod
>> ec-1.4.jar:/usr/lib/hadoop-0.20/lib/commons-daemon-1.0.1.jar:/usr/= lib/
>> hadoop-0.20/lib/commons-el-1.0.jar:/usr/lib/hadoop-0.20/lib/common= s-ht
>> tpclient-3.0.1.jar:/usr/lib/hadoop-0.20/lib/commons-logging-1.0.4.= jar:
>> /usr/lib/hadoop-0.20/lib/commons-logging-api-1.0.4.jar:/usr/lib/ha= doop
>> -0.20/lib/commons-net-1.4.1.jar:/usr/lib/hadoop-0.20/lib/core-3.1.= 1.ja
>> r:/usr/lib/hadoop-0.20/lib/hadoop-fairscheduler-0.20.2-cdh3u1.jar:= /usr
>> /lib/hadoop-0.20/lib/hsqldb-1.8.0.10.jar:/usr/lib/hadoop-0.20/lib/= hue-
>> plugins-1.2.0-cdh3u1.jar:/usr/lib/hadoop-0.20/lib/jackson-core-asl= -1.5
>> .2.jar:/usr/lib/hadoop-0.20/lib/jackson-mapper-asl-1.5.2.jar:/usr/= lib/
>> hadoop-0.20/lib/jasper-compiler-5.5.12.jar:/usr/lib/hadoop-0.20/li= b/ja
>> sper-runtime-5.5.12.jar:/usr/lib/hadoop-0.20/lib/jets3t-0.6.1.jar:= /usr
>> /lib/hadoop-0.20/lib/jetty-6.1.26.jar:/usr/lib/hadoop-0.20/lib/jet= ty-s
>> ervlet-tester-6.1.26.jar:/usr/lib/hadoop-0.20/lib/jetty-util-6.1.2= 6.ja
>> r:/usr/lib/hadoop-0.20/lib/jsch-0.1.42.jar:/usr/lib/hadoop-0.20/li= b/ju
>> nit-4.5.jar:/usr/lib/hadoop-0.20/lib/kfs-0.2.2.jar:/usr/lib/hadoop= -0.2
>> 0/lib/log4j-1.2.15.jar:/usr/lib/hadoop-0.20/lib/mockito-all-1.8.2.= jar:
>> /usr/lib/hadoop-0.20/lib/oro-2.0.8.jar:/usr/lib/hadoop-0.20/lib/se= rvle
>> t-api-2.5-20081211.jar:/usr/lib/hadoop-0.20/lib/servlet-api-2.5-6.= 1.14
>> .jar:/usr/lib/hadoop-0.20/lib/slf4j-api-1.4.3.jar:/usr/lib/hadoop-= 0.20
>> /lib/slf4j-log4j12-1.4.3.jar:/usr/lib/hadoop-0.20/lib/xmlenc-0.52.= jar:
>> /usr/lib/hadoop-0.20/lib/jsp-2.1/jsp-2.1.jar:/usr/lib/hadoop-0.20/= lib/
>> jsp-2.1/jsp-api-2.1.jar org.apache.hadoop.util.RunJar
>> /home/SEMDIRECTOR/tdavidson/nutch-2.job
>> org.apache.nutch.crawl.InjectorJob urls
>>
>> 11/08/01 11:51:54 INFO crawl.InjectorJob: InjectorJob: starting >>
>> 11/08/01 11:51:54 INFO crawl.InjectorJob: InjectorJob: urlDir: url= s
>>
>> 11/08/01 11:51:55 INFO connection.CassandraHostRetryService: Downe= d
>> Host Retry service started with queue size -1 and retry delay 10s<= br> >>
>> 11/08/01 11:51:55 INFO service.JmxMonitor: Registering JMX
>> me.prettyprint.cassandra.service_Test
>> Cluster:ServiceType=3Dhector,MonitorType=3Dhector
>>
>> 11/08/01 11:51:55 ERROR crawl.InjectorJob: InjectorJob:
>> org.apache.gora.util.GoraException:
>> java.lang.reflect.InvocationTargetException
>>
>>         at
>> org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFa= ctor
>> y.java:110)
>>
>>         at
>> org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFa= ctor
>> y.java:93)
>>
>>         at
>> org.apache.nutch.storage.StorageUtils.createWebStore(StorageUtils.= java
>> :59)
>>
>>         at
>> org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:243)
>>
>>         at
>> org.apache.nutch.crawl.InjectorJob.inject(InjectorJob.java:268) >>
>>         at
>> org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:282)
>>
>>         at org.apache.hadoop.ut= il.ToolRunner.run(ToolRunner.java:65)
>>
>>         at
>> org.apache.nutch.crawl.InjectorJob.main(InjectorJob.java:292)
>>
>>         at sun.reflect.NativeMe= thodAccessorImpl.invoke0(Native Method)
>>
>>         at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorIm= pl.j
>> ava:39)
>>
>>         at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAc= cess
>> orImpl.java:25)
>>
>>         at java.lang.reflect.Me= thod.invoke(Method.java:597)
>>
>>         at org.apache.hadoop.ut= il.RunJar.main(RunJar.java:186)
>>
>> Caused by: java.lang.reflect.InvocationTargetException
>>
>>         at
>> sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
>> Method)
>>
>>         at
>> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstr= ucto
>> rAccessorImpl.java:39)
>>
>>         at
>> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Delegati= ngCo
>> nstructorAccessorImpl.java:27)
>>
>>         at
>> java.lang.reflect.Constructor.newInstance(Constructor.java:513) >>
>>         at
>> org.apache.gora.util.ReflectionUtils.newInstance(ReflectionUtils.j= ava:
>> 76)
>>
>>         at
>> org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFa= ctor
>> y.java:102)
>>
>>         ... 12 more
>>
>> Caused by: java.lang.NoSuchMethodError:
>> org.apache.thrift.meta_data.FieldValueMetaData.<init>(BZ)V >>
>>         at org.apache.cassandra= .thrift.CfDef.<clinit>(CfDef.java:299)
>>
>>         at org.apache.cassandra= .thrift.KsDef.read(KsDef.java:753)
>>
>>         at
>> org.apache.cassandra.thrift.Cassandra$describe_keyspace_result.rea= d(Ca
>> ssandra.java:24338)
>>
>>         at
>> org.apache.cassandra.thrift.Cassandra$Client.recv_describe_keyspac= e(Ca
>> ssandra.java:1371)
>>
>>         at
>> org.apache.cassandra.thrift.Cassandra$Client.describe_keyspace(Cas= sand
>> ra.java:1346)
>>
>>         at
>> me.prettyprint.cassandra.service.AbstractCluster$4.execute(Abstrac= tClu
>> ster.java:192)
>>
>>         at
>> me.prettyprint.cassandra.service.AbstractCluster$4.execute(Abstrac= tClu
>> ster.java:187)
>>
>>         at
>> me.prettyprint.cassandra.service.Operation.executeAndSetResult(Ope= rati
>> on.java:101)
>>
>>         at
>> me.prettyprint.cassandra.connection.HConnectionManager.operateWith= Fail
>> over(HConnectionManager.java:232)
>>
>>         at
>> me.prettyprint.cassandra.service.AbstractCluster.describeKeyspace(= Abst
>> ractCluster.java:201)
>>
>>         at
>> org.apache.gora.cassandra.store.CassandraClient.checkKeyspace(Cass= andr
>> aClient.java:82)
>>
>>         at
>> org.apache.gora.cassandra.store.CassandraClient.init(CassandraClie= nt.j
>> ava:69)
>>
>>         at
>> org.apache.gora.cassandra.store.CassandraStore.<init>(Cassan= draStore.j
>> ava:68)
>>
>>         ... 18 more
>




--
Lewis

--_000_8FC6939DDF1D1440A318713F1E4E94BC02593CNAEXSAN01semdirec_--