Table of Contents generated with DocToc
Building Apache Spark from source code is never a time-saving job. Either mvn
or sbt
consumes quite a lot time in compilation. To this end, incremental compilation or continuous compilation will be the saver.
1. sbt
It is recommended in Spark official documents that sbt
is more suitable for day-to-day build:
But SBT is supported for day-to-day development since it can provide much faster iterative compilation.
The first step is to create a fat jar, which includes all of Spark’s dependencies:
./build/sbt -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.3 -Dscala-2.11 -Phive -Phive-thriftserver -DskipTests assembly
We will have a large jar file like this:
$ ls -hl assembly/target/scala-2.11
total 184M
-rw-rw-r-- 1 ubuntu ubuntu 184M Oct 31 15:28 spark-assembly-1.6.1-hadoop2.7.3.jar
Then create a seperated jar package for Spark itself, and incremental compilation will be conducted on this jar.
./build/sbt -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.3 -Dscala-2.11 -Phive -Phive-thriftserver -DskipTests package
Then we will have:
~/spark-1.6.1
$ ls -hl assembly/target/scala-2.11
total 184M
-rw-rw-r-- 1 ubuntu ubuntu 184M Oct 31 15:28 spark-assembly-1.6.1-hadoop2.7.3.jar
-rw-rw-r-- 1 ubuntu ubuntu 281 Oct 31 15:42 spark-assembly_2.11-1.6.1.jar
Or in an interactive way:
$ ./build/sbt -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.3 -Dscala-2.11 -Phive -Phive-thriftserver -DskipTests
> assembly
....
> package
Enter the incremental compilation mode:
$ ./build/sbt -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.3 -Dscala-2.11 -Phive -Phive-thriftserver -DskipTests
> ~compile
To launch Spark from the seperated jar, we have to set the env variable:
$ export SPARK_PREPEND_CLASSES=true
$ ./bin/start-all.sh
Please refer to http://www.voidcn.com/blog/lovehuangjiaju/article/p-4669432.html
With this env variable, the launch path of Spark is different:
# Add the launcher build dir to the classpath if requested.
if [ -n "$SPARK_PREPEND_CLASSES" ]; then
LAUNCH_CLASSPATH="${SPARK_HOME}/launcher/target/scala-$SPARK_SCALA_VERSION/classes:$LAUNCH_CLASSPATH"
fi
2. mvn
For mvn
, the incremental compilation is similar:
$ ./build/mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.3 -Phive -Phive-thriftserver -Dscala-2.11 -DskipTests clean install
Then, as mentioned in Spark official documents Building Spark
says:
$ cd core
$ ../build/mvn scala:cc
Also setup the SPARK_SCALA_VERSION
.