Apache Zeppelin 0.5.6 : spark 1.6.1 client : hadoop 2.6.0-cdh5.5.1 in remote cluster

Some trials, some errors, some success…

Downloaded latest release which at this time is 0.5.6 from here and unzipped into incubator-zeppelin folder. Built using this command:

mvn clean package -DskipTests -Pspark-1.5 -Dspark.version=1.6.1 -Dhadoop.version-2.6.0-cdh5.5.1 -Phadoop-2.6 -Pyarn

caveat coder:

  • -Pspark-1.5 corresponds to the profile in the pom, not my actual spark version
  • I specifically used a spark build that used scala 2.10: I ran into ClassNotFoundExceptions when trying to run zeppelin with a spark built on scala 2.11.
  • zeppelin seems to require java7 at present

Configured zeppelin_env.sh:

export JAVA_HOME=/Users/lhurley/software/jdk1.7.0_79
export HADOOP_CONF_DIR=/Users/lhurley/software/hadoop/etc/hadoop
export ZEPPELIN_JAVA_OPTS="-Dhdp.version=2.6.0-cdh5.5.1"
export SPARK_HOME=/Users/lhurley/software/spark

Ran the zeppelin daemon:

./bin/zeppelin-daemon.sh start

caveat coder:

  1. trying to run the daemon in a tmux shell failed:
"Zeppelin process died  [FAILED]")

and you see this in the log file:

"nohup: can't detach from console: No such file or directory"

so: don’t run it in tmux…

2. and running over vpn using remote cluster I got this:

java.net.UnknownHostException: lhurley-mac: nodename nor servname provided, or not known.

I had to add my hostname /etc/hosts: localhost lhurley-mac

Here is my sample notebook based on a music recommender from Advanced Analytics with Spark.

Posted in code Tagged with: , ,