Hive Warehouse Connector (HWC)
Hive Warehouse Connector (HWC) integrates Spark SQL with Hive via HiveServer2 (HS2) and LLAP. It is designed for reading and writing Hive tables (including ACID) and for enforcing Hive authorization policies when required.
Version Highlights (Clemlab ODP)
- Spark 3.5.x (default: 3.5.4)
- Hive 4.0.1
- Scala 2.12.18
- Calcite 1.25.0
- Avatica 1.12.0
These can be overridden with sbt system properties (for example: -Dspark.version=3.5.6).
Downloads (Nexus)
Artifacts are currently published as version 1.3.1, compatible with ODP 1.3.1.0.
| Artifact | Link |
|---|---|
| Assembly (fat) jar | https://nexus.clemlab.com/repository/spark-packages/com/hortonworks/hive/hive-warehouse-connector_2.12/1.3.1/hive-warehouse-connector_2.12-1.3.1-assembly.jar |
| Main jar | https://nexus.clemlab.com/repository/spark-packages/com/hortonworks/hive/hive-warehouse-connector_2.12/1.3.1/hive-warehouse-connector_2.12-1.3.1.jar |
| Sources | https://nexus.clemlab.com/repository/spark-packages/com/hortonworks/hive/hive-warehouse-connector_2.12/1.3.1/hive-warehouse-connector_2.12-1.3.1-sources.jar |
| Javadoc | https://nexus.clemlab.com/repository/spark-packages/com/hortonworks/hive/hive-warehouse-connector_2.12/1.3.1/hive-warehouse-connector_2.12-1.3.1-javadoc.jar |
| POM | https://nexus.clemlab.com/repository/spark-packages/com/hortonworks/hive/hive-warehouse-connector_2.12/1.3.1/hive-warehouse-connector_2.12-1.3.1.pom |
Supported Functionalities
The table below summarizes what is covered and supported.
| Feature area | Status | Notes |
|---|---|---|
| Read modes (direct reader, JDBC, secure access) | Supported | - |
| Secure access caching control | Supported | spark.hadoop.secure.access.cache.disable. |
| JDBC read mode (client/cluster) | Supported | - |
| Spark SQL extension for direct reader | Supported | spark.sql.extensions=com.hortonworks.spark.sql.rule.Extensions. |
| Batch writes (DataFrame writer) | Supported | Writes stage data then LOAD DATA. |
| Streaming writes to ACID tables | Supported | HiveStreamingDataSource sink. |
| HWC session API (catalog ops, executeUpdate, commitTxn, close) | Supported | See API page. |
| Spark executor metrics | Partial | Uses Spark metrics and HWC listener, no extra packaging. |
| PySpark integration | Supported | python/ package + assembly jar. |
| sparklyr integration | Not shipped | not bundled here. |
| Zeppelin integration | Manual | Add jar and Spark conf; not packaged. |
If you rely on a feature that is not bundled (sparklyr helpers, Zeppelin configuration templates), you can still use the connector by adding the assembly jar and setting Spark configs manually.
Where to start
- Start with Getting Started for a minimal setup and a Spark shell example.
- Use Configuration and Read Modes to select secure access vs direct reader.
- Use Writes and Streaming for batch and streaming write configuration.
- Use Limitations and Types before production rollout.