Skip to main content
Version: 1.3.1.0

Getting Started

This page covers a minimal setup for Spark 3.5 and Hive 4.0.1 using Hive Warehouse Connector.

Prerequisites

  • HiveServer2 (LLAP) and Hive Metastore are running and reachable.
  • Spark has access to Hive client configs (hive-site.xml, core-site.xml, hdfs-site.xml).
  • If Kerberos is enabled, a valid principal and keytab are available and HS2 is configured for Kerberos auth.

Build the assembly jar

From the repo root:

sbt -Dspark.version=3.5.6 -Dhive.version=4.0.1 -Dscala.version=2.12.18 assembly

The jar will be under target/scala-2.12/.

Spark shell (secure access mode)

spark-shell \
--jars target/scala-2.12/hive-warehouse-connector-assembly-1.3.1.jar \
--conf spark.sql.hive.hiveserver2.jdbc.url="jdbc:hive2://hs2-host:10001/;transportMode=http;httpPath=cliservice;ssl=true" \
--conf spark.sql.hive.hiveserver2.jdbc.url.principal=hive/hs2-host@EXAMPLE.COM \
--conf spark.hadoop.hive.metastore.uris=thrift://hms-host:9083 \
--conf spark.datasource.hive.warehouse.read.mode=secure_access \
--conf spark.datasource.hive.warehouse.read.jdbc.mode=cluster \
--conf spark.datasource.hive.warehouse.load.staging.dir=hdfs://nameservice/apps/hwc_staging \
--conf spark.sql.extensions=com.hortonworks.spark.sql.rule.Extensions

Minimal Scala usage

import com.hortonworks.hwc.HiveWarehouseSession

val hwc = HiveWarehouseSession.session(spark).build()

hwc.executeUpdate("create database if not exists hwc_it")

val df = spark.range(0, 10).selectExpr("id", "concat('v', id) as v")

df.write
.format("com.hortonworks.spark.sql.hive.llap.HiveWarehouseConnector")
.option("database", "hwc_it")
.option("table", "t_acid")
.mode("overwrite")
.save()

hwc.sql("select count(*) as c from hwc_it.t_acid").show()

PySpark usage

pyspark \
--jars target/scala-2.12/hive-warehouse-connector-assembly-1.3.1.jar \
--py-files python/pyspark_hwc-1.3.1.zip
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()

spark.sql("show databases").show()