Tuesday 15 December 2015

Integrating Tableau Desktop with Spark SQL

In this post we will see how we can integrate Tableau Desktop with Spark SQL. Tableau’s integration with Spark brings tremendous value to the Spark community – we can visually analyse data without writing a single line of Spark SQL code. That’s a big deal because creating a visual interface to our data expands the Spark technology beyond data scientists and data engineers to all business users. The Spark connector takes advantage of Tableau’s flexible connection architecture that gives customers the option to connect live and issue interactive queries, or use Tableau’s fast in-memory database engine.

Software requirements :-

We will be using the following softwares to do the integration -
1. Tableau Desktop-9-2-0
2. Hive 1.2.1
3. Spark 1.4.0 for Hadoop 2.6.0

We can skip the Hive and can directly work with Spark SQL. For this example we will use Hive, import Hive tables to Spark SQL and will Integrate them with Tableau SQL.

Hive Setup :-

1. Download and install Hive 1.2.1.
2. Download and copy mysql connector jar file to ${HIVE_HOME}/lib directory so hive will use           MySql metastore.
3. Start Hive ${HIVE_HOME}/bin $./hive
4. Create some table and insert data to that table

create table product(productid INT, productname STRING, proce FLOAT, category STRING) ROW FORMAT DELIMITED
        FIELDS TERMINATED BY ',';

INSERT INTO TABLE product VALUES(1,Book,25,Statonery);
INSERT INTO TABLE product VALUES(2,Pens,10,Stationery);
INSERT INTO TABLE product VALUES(3,Sugar,40.05,House Hold Item);
INSERT INTO TABLE product VALUES(4,Furniture,1200,Interiors);

Hive setup is complete now.

Spark Setup :-

1. Download and extract Spark 1.5.2 for Hadoop 2.6.0
2. Copy hive-site.xml from ${HIVE_HOME}/conf directory to ${SPARK_HOME}/conf directory
3. Replace all "s" from time values like 0s to 0 or <xyz>ms to <xyz> else it might give us Number         Format Exception
4. Define  SPARK MASTER IP export SPARK_MASTER_IP=<host_ip_addr>  in spark-env.sh file  (without this thrift server will not work) located at ${SPARK_HOME}/conf directory

5. Start spark master and slave
  1. ${SPARK_HOME}/sbin $./start-master.sh 
  2. ${SPARK_HOME}/sbin $./start-slaves.sh 
6. Goto http://localhost:8080/   and check that worker has started

Now time to start Thrift server -

7.  ${SPARK_HOME}/sbin $ ././start-thriftserver.sh --master spark://<spark_host_ip>:<port> --driver-class-path ../lib/mysql-connector-java-5.1.34.jar  --hiveconf hive.server2.thrift.bind.host localhost --hiveconf hive.server2.thrift.port 10001

It will start thrift server on 10001 port


8. Go to http://localhost:8080/  and check spark sql application has started








































Now go to Tableau Desktop
  1. Select Spark Sql.
  2. Enter host as localhost, enter thrift server port from step here its 10001
  3. Select type as SparkThriftServer, Authentication as User Name 
  4. Keep rest of the fields empty and click on OK
You are done!!! Happy report building using Tableau-Spark.




6 comments: