Skip to main content
Version: 1.3.1.0

Hive Warehouse Connector (HWC)

Hive Warehouse Connector (HWC) integrates Spark SQL with Hive via HiveServer2 (HS2) and LLAP. It is designed for reading and writing Hive tables (including ACID) and for enforcing Hive authorization policies when required.

Version Highlights (Clemlab ODP)

  • Spark 3.5.x (default: 3.5.4)
  • Hive 4.0.1
  • Scala 2.12.18
  • Calcite 1.25.0
  • Avatica 1.12.0

These can be overridden with sbt system properties (for example: -Dspark.version=3.5.6).

Downloads (Nexus)

Artifacts are currently published as version 1.3.1, compatible with ODP 1.3.1.0.

ArtifactLink
Assembly (fat) jarhttps://nexus.clemlab.com/repository/spark-packages/com/hortonworks/hive/hive-warehouse-connector_2.12/1.3.1/hive-warehouse-connector_2.12-1.3.1-assembly.jar
Main jarhttps://nexus.clemlab.com/repository/spark-packages/com/hortonworks/hive/hive-warehouse-connector_2.12/1.3.1/hive-warehouse-connector_2.12-1.3.1.jar
Sourceshttps://nexus.clemlab.com/repository/spark-packages/com/hortonworks/hive/hive-warehouse-connector_2.12/1.3.1/hive-warehouse-connector_2.12-1.3.1-sources.jar
Javadochttps://nexus.clemlab.com/repository/spark-packages/com/hortonworks/hive/hive-warehouse-connector_2.12/1.3.1/hive-warehouse-connector_2.12-1.3.1-javadoc.jar
POMhttps://nexus.clemlab.com/repository/spark-packages/com/hortonworks/hive/hive-warehouse-connector_2.12/1.3.1/hive-warehouse-connector_2.12-1.3.1.pom

Supported Functionalities

The table below summarizes what is covered and supported.

Feature areaStatusNotes
Read modes (direct reader, JDBC, secure access)Supported-
Secure access caching controlSupportedspark.hadoop.secure.access.cache.disable.
JDBC read mode (client/cluster)Supported-
Spark SQL extension for direct readerSupportedspark.sql.extensions=com.hortonworks.spark.sql.rule.Extensions.
Batch writes (DataFrame writer)SupportedWrites stage data then LOAD DATA.
Streaming writes to ACID tablesSupportedHiveStreamingDataSource sink.
HWC session API (catalog ops, executeUpdate, commitTxn, close)SupportedSee API page.
Spark executor metricsPartialUses Spark metrics and HWC listener, no extra packaging.
PySpark integrationSupportedpython/ package + assembly jar.
sparklyr integrationNot shippednot bundled here.
Zeppelin integrationManualAdd jar and Spark conf; not packaged.

If you rely on a feature that is not bundled (sparklyr helpers, Zeppelin configuration templates), you can still use the connector by adding the assembly jar and setting Spark configs manually.

Where to start

  • Start with Getting Started for a minimal setup and a Spark shell example.
  • Use Configuration and Read Modes to select secure access vs direct reader.
  • Use Writes and Streaming for batch and streaming write configuration.
  • Use Limitations and Types before production rollout.