Skip to main content
Version: 1.3.1.0

Configuration

This page summarizes the core Spark and Hive settings used by HWC.

Core properties

PropertyPurpose
spark.sql.hive.hiveserver2.jdbc.urlHS2 JDBC URL (LLAP).
spark.sql.hive.hiveserver2.jdbc.url.principalKerberos principal for HS2.
spark.hadoop.hive.metastore.urisHMS Thrift URI(s).
spark.datasource.hive.warehouse.read.modeRead mode: direct_reader_v1, direct_reader_v2, jdbc_client, jdbc_cluster, secure_access.
spark.datasource.hive.warehouse.read.jdbc.modeJDBC read mode: client or cluster.
spark.datasource.hive.warehouse.load.staging.dirFully qualified staging directory URI used by secure access and batch writes.
spark.sql.extensionsEnable HWC Spark SQL extension for direct reader integration.

HiveServer2 (HS2) configuration

HWC connects to HS2 via JDBC. For Kerberos clusters:

  • YARN client mode: set spark.sql.hive.hiveserver2.jdbc.url.principal.
  • YARN cluster mode: set spark.security.credentials.hiveserver2.enabled=true to obtain HS2 delegation tokens.

Optionally, use spark.sql.hive.conf.list to append Hive confs to the HS2 URL. This is useful for passing query-level Hive settings without modifying hive-site.xml.

Hive Metastore (HMS)

HWC uses HMS for table metadata and for direct reader mode. Ensure Spark can reach HMS:

spark.hadoop.hive.metastore.uris=thrift://hms-host:9083

If you use HA, prefer the nameservice alias rather than host:port in both HMS URIs and staging directories.

LLAP configuration

Direct reader mode uses LLAP and ORC readers. Typical LLAP-related properties include:

spark.hadoop.hive.llap.daemon.service.hosts=@llap0
spark.hadoop.hive.zookeeper.quorum=zk1:2181,zk2:2181,zk3:2181

Staging directory for secure access and writes

Secure access mode and batch writes stage data in HDFS. Use a fully qualified URI:

spark.datasource.hive.warehouse.load.staging.dir=hdfs://nameservice/apps/hwc_staging

Recommended permissions on the parent directory are sticky (for example 1703) so users can create session folders but cannot remove other users' data.

Spark SQL extension

To use the HWC direct reader via Spark SQL, enable the extension:

spark.sql.extensions=com.hortonworks.spark.sql.rule.Extensions

Optional: Kryo registrator

For large ORC rows, set a Kryo registrator and increase the buffer:

spark.serializer=org.apache.spark.serializer.KryoSerializer
spark.kryo.registrator=com.hortonworks.spark.hive.utils.HiveAcidKyroRegistrator
spark.kryoserializer.buffer.max=256m