Configuration
This page summarizes the core Spark and Hive settings used by HWC.
Core properties
| Property | Purpose |
|---|---|
spark.sql.hive.hiveserver2.jdbc.url | HS2 JDBC URL (LLAP). |
spark.sql.hive.hiveserver2.jdbc.url.principal | Kerberos principal for HS2. |
spark.hadoop.hive.metastore.uris | HMS Thrift URI(s). |
spark.datasource.hive.warehouse.read.mode | Read mode: direct_reader_v1, direct_reader_v2, jdbc_client, jdbc_cluster, secure_access. |
spark.datasource.hive.warehouse.read.jdbc.mode | JDBC read mode: client or cluster. |
spark.datasource.hive.warehouse.load.staging.dir | Fully qualified staging directory URI used by secure access and batch writes. |
spark.sql.extensions | Enable HWC Spark SQL extension for direct reader integration. |
HiveServer2 (HS2) configuration
HWC connects to HS2 via JDBC. For Kerberos clusters:
- YARN client mode: set
spark.sql.hive.hiveserver2.jdbc.url.principal. - YARN cluster mode: set
spark.security.credentials.hiveserver2.enabled=trueto obtain HS2 delegation tokens.
Optionally, use spark.sql.hive.conf.list to append Hive confs to the HS2 URL. This is useful for passing
query-level Hive settings without modifying hive-site.xml.
Hive Metastore (HMS)
HWC uses HMS for table metadata and for direct reader mode. Ensure Spark can reach HMS:
spark.hadoop.hive.metastore.uris=thrift://hms-host:9083
If you use HA, prefer the nameservice alias rather than host:port in both HMS URIs and staging directories.
LLAP configuration
Direct reader mode uses LLAP and ORC readers. Typical LLAP-related properties include:
spark.hadoop.hive.llap.daemon.service.hosts=@llap0
spark.hadoop.hive.zookeeper.quorum=zk1:2181,zk2:2181,zk3:2181
Staging directory for secure access and writes
Secure access mode and batch writes stage data in HDFS. Use a fully qualified URI:
spark.datasource.hive.warehouse.load.staging.dir=hdfs://nameservice/apps/hwc_staging
Recommended permissions on the parent directory are sticky (for example 1703) so users can create session
folders but cannot remove other users' data.
Spark SQL extension
To use the HWC direct reader via Spark SQL, enable the extension:
spark.sql.extensions=com.hortonworks.spark.sql.rule.Extensions
Optional: Kryo registrator
For large ORC rows, set a Kryo registrator and increase the buffer:
spark.serializer=org.apache.spark.serializer.KryoSerializer
spark.kryo.registrator=com.hortonworks.spark.hive.utils.HiveAcidKyroRegistrator
spark.kryoserializer.buffer.max=256m