After a certain experience of Spark, to be able to follow the development progress Spark source code, detailed analysis of its source code to read, this article details how to use IntelliJ IDEA to import the latest Spark source code from Github, and compile it.
Ready to work
First you need to install the system in JDK 1.6+, and install the Scala. After downloading the latest version of IntelliJ IDEA, the first installation (first opened will recommend you install) Scala plugin, related methods do not say. At this point, your system should be able to run Scala on the command line. My system environment is as follows:
1. Mac OS X (10.9.5)
2. JDK 1.7.71
3. Scala 2.10.4
4. IntelliJ IDEA 14
In addition, it is recommended that you finally started to use the pre-built Spark, Spark on the run, using the methods understand, write some Spark expand the application after reading the source code and try to modify the source code, compile manually.
Import Spark project from Github
After opening IntelliJ IDEA, the menu bar, select VCS-> Check out from Version Control-> Git, then fill in the address Spark project in Git Repository URL, and appoint good local path
Click on the window of the Clone, began clone from Github in the project, the process speed test you may be, takes about 3-10 minutes.
When the clone is complete, IntelliJ IDEA will automatically prompt you that the project has a corresponding pom.xml file is open. Here directly select Open the pom.xml file, then the system will automatically resolve dependencies in a project related to this step will be because of your network and systems environments required for different times.
After this step is complete, manually edit Spark pom.xml file in the root directory, find where the specified java version of the line (java.version), depending on your system environment, if you are using jdk1.7, then perhaps you need its value is changed to 1.7 (default is 1.6).
After opening the terminal shell, at the command line to enter the spark just imported the project root directory, execute
sbt / sbt assembly
The compile command will all use the default configuration to compile Spark, if you want to specify the version related components, you can view the official website of Spark Build-Spark (http://spark.apache.org/docs/latest/building-spark.html ) to see all the commonly used compiler options. This process is currently no VPN to complete, in order to estimate the time needed to compile, you can open a new shell terminal, continue to view the size of the spark project directory, I end up using the default configuration, compiled after the success of the spark directory size 2.0G.
Thus, in order to compile the results of your test, you can enter the spark / bin directory, run the spark-shell on the command line, if everything is normal start, then compile successfully. Spark if you modify the source code, you can re-use the sbt to compile, and the compiler will not be like the first time so long to compile. If you have any questions, comments welcome exchange!