Version: 1.3.1.0

Getting Started

This page covers a minimal setup for Spark 3.5 and Hive 4.0.1 using Hive Warehouse Connector.

Prerequisites

HiveServer2 (LLAP) and Hive Metastore are running and reachable.
Spark has access to Hive client configs (hive-site.xml, core-site.xml, hdfs-site.xml).
If Kerberos is enabled, a valid principal and keytab are available and HS2 is configured for Kerberos auth.

Build the assembly jar

From the repo root:

sbt -Dspark.version=3.5.6 -Dhive.version=4.0.1 -Dscala.version=2.12.18 assembly

The jar will be under target/scala-2.12/.

Spark shell (secure access mode)

spark-shell \
  --jars target/scala-2.12/hive-warehouse-connector-assembly-1.3.1.jar \
  --conf spark.sql.hive.hiveserver2.jdbc.url="jdbc:hive2://hs2-host:10001/;transportMode=http;httpPath=cliservice;ssl=true" \
  --conf spark.sql.hive.hiveserver2.jdbc.url.principal=hive/hs2-host@EXAMPLE.COM \
  --conf spark.hadoop.hive.metastore.uris=thrift://hms-host:9083 \
  --conf spark.datasource.hive.warehouse.read.mode=secure_access \
  --conf spark.datasource.hive.warehouse.read.jdbc.mode=cluster \
  --conf spark.datasource.hive.warehouse.load.staging.dir=hdfs://nameservice/apps/hwc_staging \
  --conf spark.sql.extensions=com.hortonworks.spark.sql.rule.Extensions

Minimal Scala usage

import com.hortonworks.hwc.HiveWarehouseSession

val hwc = HiveWarehouseSession.session(spark).build()

hwc.executeUpdate("create database if not exists hwc_it")

val df = spark.range(0, 10).selectExpr("id", "concat('v', id) as v")

df.write
  .format("com.hortonworks.spark.sql.hive.llap.HiveWarehouseConnector")
  .option("database", "hwc_it")
  .option("table", "t_acid")
  .mode("overwrite")
  .save()

hwc.sql("select count(*) as c from hwc_it.t_acid").show()

PySpark usage

pyspark \
  --jars target/scala-2.12/hive-warehouse-connector-assembly-1.3.1.jar \
  --py-files python/pyspark_hwc-1.3.1.zip

from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()

spark.sql("show databases").show()

Getting Started

Prerequisites​

Build the assembly jar​

Spark shell (secure access mode)​

Minimal Scala usage​

PySpark usage​

Prerequisites

Build the assembly jar

Spark shell (secure access mode)

Minimal Scala usage

PySpark usage