Version: 1.3.1.0

Integrations and Examples

This page summarizes integration of Hive Warehouse Connector (HWC) in Spark applications, including example code snippets.

spark-submit

Include the assembly jar and required Spark configs:

spark-submit \
  --class com.yourorg.YourApp \
  --jars target/scala-2.12/hive-warehouse-connector-assembly-1.3.1.jar \
  --conf spark.sql.hive.hiveserver2.jdbc.url="jdbc:hive2://hs2-host:10001/;transportMode=http;httpPath=cliservice;ssl=true" \
  --conf spark.sql.hive.hiveserver2.jdbc.url.principal=hive/hs2-host@EXAMPLE.COM \
  --conf spark.datasource.hive.warehouse.read.mode=secure_access \
  --conf spark.datasource.hive.warehouse.load.staging.dir=hdfs://nameservice/apps/hwc_staging \
  your-app.jar

PySpark

Add the assembly jar and Python package:

pyspark \
  --jars target/scala-2.12/hive-warehouse-connector-assembly-1.3.1.jar \
  --py-files python/pyspark_hwc-1.3.1.zip

Example: simple ETL

val hwc = com.hortonworks.hwc.HiveWarehouseSession.session(spark).build()

val src = spark.table("default.source")
val transformed = src.filter("flag = true")

transformed.write
  .format("com.hortonworks.spark.sql.hive.llap.HiveWarehouseConnector")
  .option("database", "default")
  .option("table", "target")
  .mode("overwrite")
  .save()

Integrations and Examples

spark-submit​

PySpark​

Example: simple ETL​

spark-submit

PySpark

Example: simple ETL