nutch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kaliyug Antagonist <kaliyugantagon...@gmail.com>
Subject Nutch pointed to Cassandra, yet, asks for Hadoop
Date Fri, 23 Feb 2018 18:25:43 GMT
Windows 10 Nutch 2.3.1 Cassandra 3.11.1

I have extracted and built Nutch under the Cygwin's home directory.

I believe that the Cassandra server is working:

INFO  [main] 2018-02-23 16:20:41,077 StorageService.java:1442 -
JOINING: Finish joining ring
INFO  [main] 2018-02-23 16:20:41,820 SecondaryIndexManager.java:509 -
Executing pre-join tasks for: CFS(Keyspace='test',
ColumnFamily='test')
INFO  [main] 2018-02-23 16:20:42,161 StorageService.java:2268 - Node
localhost/127.0.0.1 state jump to NORMAL
INFO  [main] 2018-02-23 16:20:43,049 NativeTransportService.java:75 -
Netty using Java NIO event loop
INFO  [main] 2018-02-23 16:20:43,358 Server.java:155 - Using Netty
Version: [netty-buffer=netty-buffer-4.0.44.Final.452812a,
netty-codec=netty-codec-4.0.44.Final.452812a,
netty-codec-haproxy=netty-codec-haproxy-4.0.44.Final.452812a,
netty-codec-http=netty-codec-http-4.0.44.Final.452812a,
netty-codec-socks=netty-codec-socks-4.0.44.Final.452812a,
netty-common=netty-common-4.0.44.Final.452812a,
netty-handler=netty-handler-4.0.44.Final.452812a,
netty-tcnative=netty-tcnative-1.1.33.Fork26.142ecbb,
netty-transport=netty-transport-4.0.44.Final.452812a,
netty-transport-native-epoll=netty-transport-native-epoll-4.0.44.Final.452812a,
netty-transport-rxtx=netty-transport-rxtx-4.0.44.Final.452812a,
netty-transport-sctp=netty-transport-sctp-4.0.44.Final.452812a,
netty-transport-udt=netty-transport-udt-4.0.44.Final.452812a]
INFO  [main] 2018-02-23 16:20:43,359 Server.java:156 - Starting
listening for CQL clients on localhost/127.0.0.1:9042 (unencrypted)...
INFO  [main] 2018-02-23 16:20:43,941 CassandraDaemon.java:527 - Not
starting RPC server as requested. Use JMX
(StorageService->startRPCServer()) or nodetool (enablethrift) to start
it

I did the following check:

apache-cassandra-3.11.1\bin>nodetool status
Datacenter: datacenter1
========================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address    Load       Tokens       Owns (effective)  Host ID
                        Rack
UN  127.0.0.1  273.97 KiB  256          100.0%
dab932f2-d138-4a1a-acd4-f63cbb16d224  rack1

csql connects

apache-cassandra-3.11.1\bin>cqlsh

WARNING: console codepage must be set to cp65001 to support utf-8
encoding on Windows platforms.
If you experience encoding problems, change your console codepage with
'chcp 65001' before starting cqlsh.

Connected to Test Cluster at 127.0.0.1:9042.
[cqlsh 5.0.1 | Cassandra 3.11.1 | CQL spec 3.4.4 | Native protocol v4]
Use HELP for help.
WARNING: pyreadline dependency missing.  Install to enable tab completion.
cqlsh> describe keyspaces

system_schema  system_auth  system  system_distributed  test  system_traces

I followed the tutorial 'Setting up NUTCH 2.x with CASSANDRA
<https://wiki.apache.org/nutch/Nutch2Cassandra>' and added the respective
entries in the properties and the xml files.

I go to the Cygwin prompt and attempt to crawl. Instead of using Cassandra,
it asks for Hadoop(HBase, probably)

/home/apache-nutch-2.3.1
$ ./runtime/deploy/bin/crawl urls/ crawl/ 1
No SOLRURL specified. Skipping indexing.
which: no hadoop in (<dump of the classpath entries>)
Can't find Hadoop executable. Add HADOOP_HOME/bin to the path or run
in local mode.



<http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>
Virus-free.
www.avg.com
<http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>
<#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message