Blog Archives

Installing Livy on a Hadoop Cluster

Purpose

Livy is an open source component to Apache Spark that allows you to submit REST calls to your Apache Spark Cluster. You can view the source code here: https://github.com/cloudera/livy

In this post I will be going over the steps you would need to follow to get Livy installed on a Hadoop Cluster. The steps were derived from the above source code link, however, this post provides more information on how to test it in a more simple manner.

Install Steps

  1. Determine which node in your cluster will act as the Livy server
    1. Note: the server will need to have Hadoop and Spark libraries and configurations deployed on them.
  2. Login to the machine as Root
  3. Download the Livy source code
    cd /opt
    wget https://github.com/cloudera/livy/archive/v0.2.0.zip
    unzip v0.2.0.zip
    cd livy-0.2.0
  4. Get the version of spark that is currently installed on your cluster
    1. Run the following command
      spark-submit --version
    2. Example: 1.6.0
    3. Use this value in downstream commands as {SPARK_VERSION}
  5.  Build the Livy source code with Maven
    /usr/local/apache-maven/apache-maven-3.0.4/bin/mvn -DskipTests=true -Dspark.version={SPARK_VERSION} clean package
  6. Your done!

Steps to Control Livy

Get Status

ps -eaf | grep livy

It will  be listed like the following:

root      9379     1 14 18:28 pts/0    00:00:01 java -cp /opt/livy-0.2.0/server/target/jars/*:/opt/livy-0.2.0/conf:/etc/hadoop/conf: com.cloudera.livy.server.LivyServer

Start

Note: Run as Root

cd /opt/livy-0.2.0/
export SPARK_HOME=/usr/lib/spark
export HADOOP_CONF_DIR=/etc/hadoop/conf
./bin/livy-server start

Once started, the Livy Server can be called with the following host and port:

http://localhost:8998

If you’re calling it from another machine, then you will need to update “localhost” to the Public IP or Hostname of the Livy server.

Stop

Note: Run as Root

cd /opt/livy-0.2.0/
./bin/livy-server stop

Testing Livy

This assumes you are running it from the machine where Livy was installed. Hence why we’re using localhost. If you would like to test it from another machine, then you just need to change “localhost” to the Public IP or Hostname of the Livy server.

  1. Create a new Livy Session
    1. Curl Command
      curl -H "Content-Type: application/json" -X POST -d '{"kind":"spark"}' -i http://localhost:8998/sessions
    2. Output
      HTTP/1.1 201 Created
      Date: Wed, 02 Nov 2016 22:38:13 GMT
      Content-Type: application/json; charset=UTF-8
      Location: /sessions/1
      Content-Length: 81
      Server: Jetty(9.2.16.v20160414)
      
      {"id":1,"owner":null,"proxyUser":null,"state":"starting","kind":"spark","log":[]}
  2. View Current Livy Sessions
    1. Curl Command
      curl -H "Content-Type: application/json" -i http://localhost:8998/sessions
    2. Output
      HTTP/1.1 200 OK
      Date: Tue, 08 Nov 2016 02:30:34 GMT
      Content-Type: application/json; charset=UTF-8
      Content-Length: 111
      Server: Jetty(9.2.16.v20160414)
      
      {"from":0,"total":1,"sessions":[{"id":0,"owner":null,"proxyUser":null,"state":"idle","kind":"spark","log":[]}]}
  3. Get Livy Session Info
    1. Curl Command
      curl -H "Content-Type: application/json" -i http://localhost:8998/sessions/0
    2. Output
      HTTP/1.1 200 OK
      Date: Tue, 08 Nov 2016 02:31:04 GMT
      Content-Type: application/json; charset=UTF-8
      Content-Length: 77
      Server: Jetty(9.2.16.v20160414)
      
      {"id":0,"owner":null,"proxyUser":null,"state":"idle","kind":"spark","log":[]}
  4. Submit job to Livy
    1. Curl Command
      curl -H "Content-Type: application/json" -X POST -d '{"code":"println(sc.parallelize(1 to 5).collect())"}' -i http://localhost:8998/sessions/0/statements
    2. Output
      HTTP/1.1 201 Created
      Date: Tue, 08 Nov 2016 02:31:29 GMT
      Content-Type: application/json; charset=UTF-8
      Location: /sessions/0/statements/0
      Content-Length: 40
      Server: Jetty(9.2.16.v20160414)
      
      {"id":0,"state":"running","output":null}
  5. Get Job Status and Output
    1. Curl Command
      curl -H "Content-Type: application/json" -i http://localhost:8998/sessions/0/statements/0
    2. Output
      HTTP/1.1 200 OK
      Date: Tue, 08 Nov 2016 02:32:15 GMT
      Content-Type: application/json; charset=UTF-8
      Content-Length: 109
      Server: Jetty(9.2.16.v20160414)
      
      {"id":0,"state":"available","output":{"status":"ok","execution_count":0,"data":{"text/plain":"[I@6270e14a"}}}
  6. Delete Session
    1. Curl Command
      curl -H "Content-Type: application/json" -X DELETE -d -i http://localhost:8998/sessions/0
    2. Output
      {"msg":"deleted"}