Integrations and Examples
This page summarizes integration of Hive Warehouse Connector (HWC) in Spark applications, including example code snippets.
spark-submit
Include the assembly jar and required Spark configs:
spark-submit \
--class com.yourorg.YourApp \
--jars target/scala-2.12/hive-warehouse-connector-assembly-1.3.1.jar \
--conf spark.sql.hive.hiveserver2.jdbc.url="jdbc:hive2://hs2-host:10001/;transportMode=http;httpPath=cliservice;ssl=true" \
--conf spark.sql.hive.hiveserver2.jdbc.url.principal=hive/hs2-host@EXAMPLE.COM \
--conf spark.datasource.hive.warehouse.read.mode=secure_access \
--conf spark.datasource.hive.warehouse.load.staging.dir=hdfs://nameservice/apps/hwc_staging \
your-app.jar
PySpark
Add the assembly jar and Python package:
pyspark \
--jars target/scala-2.12/hive-warehouse-connector-assembly-1.3.1.jar \
--py-files python/pyspark_hwc-1.3.1.zip
Example: simple ETL
val hwc = com.hortonworks.hwc.HiveWarehouseSession.session(spark).build()
val src = spark.table("default.source")
val transformed = src.filter("flag = true")
transformed.write
.format("com.hortonworks.spark.sql.hive.llap.HiveWarehouseConnector")
.option("database", "default")
.option("table", "target")
.mode("overwrite")
.save()