Blog Archives

The inevitable “Task not serializable” SparkException

The good old: org.apache.spark.SparkException: Task not serializable usually surfaces at least once in a spark developer’s career, or in my case, whenever enough time has gone by since I’ve seen it that I’ve conveniently forgotten its existence and the fact

Posted in code Tagged with: , , , ,

A week in ML part 2: algorithms

The previous post established the need for a thorough understanding of the data. Assuming we know our data, we can now consider the algorithms. Our chosen problem was a prediction question: clearly a supervised learning exercise, and as we expected

Posted in machine learning, Uncategorized Tagged with: , , ,

Apache Zeppelin 0.5.6 : spark 1.6.1 client : hadoop 2.6.0-cdh5.5.1 in remote cluster

Some trials, some errors, some success… Downloaded latest release which at this time is 0.5.6 from hereĀ and unzipped into incubator-zeppelin folder. Built using this command: mvn clean package -DskipTests -Pspark-1.5 -Dspark.version=1.6.1 -Dhadoop.version-2.6.0-cdh5.5.1 -Phadoop-2.6 -Pyarn caveat coder: -Pspark-1.5 corresponds to the

Posted in code Tagged with: , ,