Version: 1.3.1.0

Démarrage

Cette page décrit une configuration minimale pour Spark 3.5 et Hive 4.0.1 avec le Hive Warehouse Connector.

Prérequis

HiveServer2 (LLAP) et le Hive Metastore sont démarrés et accessibles.
Spark a accès aux configurations client Hive (hive-site.xml, core-site.xml, hdfs-site.xml).
Si Kerberos est activé, un principal et un keytab valides sont disponibles et HS2 est configuré pour l'authentification Kerberos.

Construire le jar d'assembly

À la racine du dépôt :

sbt -Dspark.version=3.5.6 -Dhive.version=4.0.1 -Dscala.version=2.12.18 assembly

Le jar sera généré sous target/scala-2.12/.

Spark shell (mode accès sécurisé)

spark-shell \
  --jars target/scala-2.12/hive-warehouse-connector-assembly-1.3.1.jar \
  --conf spark.sql.hive.hiveserver2.jdbc.url="jdbc:hive2://hs2-host:10001/;transportMode=http;httpPath=cliservice;ssl=true" \
  --conf spark.sql.hive.hiveserver2.jdbc.url.principal=hive/hs2-host@EXAMPLE.COM \
  --conf spark.hadoop.hive.metastore.uris=thrift://hms-host:9083 \
  --conf spark.datasource.hive.warehouse.read.mode=secure_access \
  --conf spark.datasource.hive.warehouse.read.jdbc.mode=cluster \
  --conf spark.datasource.hive.warehouse.load.staging.dir=hdfs://nameservice/apps/hwc_staging \
  --conf spark.sql.extensions=com.hortonworks.spark.sql.rule.Extensions

Utilisation Scala minimale

import com.hortonworks.hwc.HiveWarehouseSession

val hwc = HiveWarehouseSession.session(spark).build()

hwc.executeUpdate("create database if not exists hwc_it")

val df = spark.range(0, 10).selectExpr("id", "concat('v', id) as v")

df.write
  .format("com.hortonworks.spark.sql.hive.llap.HiveWarehouseConnector")
  .option("database", "hwc_it")
  .option("table", "t_acid")
  .mode("overwrite")
  .save()

hwc.sql("select count(*) as c from hwc_it.t_acid").show()

Utilisation PySpark

pyspark \
  --jars target/scala-2.12/hive-warehouse-connector-assembly-1.3.1.jar \
  --py-files python/pyspark_hwc-1.3.1.zip

from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()

spark.sql("show databases").show()

Démarrage

Prérequis​

Construire le jar d'assembly​

Spark shell (mode accès sécurisé)​

Utilisation Scala minimale​

Utilisation PySpark​

Prérequis

Construire le jar d'assembly

Spark shell (mode accès sécurisé)

Utilisation Scala minimale

Utilisation PySpark