Read Modes
HWC supports several read modes. Choose a mode based on security requirements, performance, and data size.
Configure with:
spark.datasource.hive.warehouse.read.mode=secure_access
spark.datasource.hive.warehouse.read.jdbc.mode=cluster
Direct reader (LLAP)
direct_reader_v1 and direct_reader_v2 read ORC data directly via LLAP without routing queries through HS2.
This is the fastest option but does not enforce HS2/Ranger policies. Use it for trusted ETL pipelines.
Key characteristics:
- Reads a consistent snapshot of a single table at query time.
- Requires HDFS permissions to access table data.
- Does not apply Hive authorization (Ranger) at HS2.
- Does not support writes or streaming inserts.
JDBC mode
jdbc_cluster and jdbc_client send queries to HS2 and return results to Spark. Use JDBC when you need HS2
authorization enforcement and query semantics provided by Hive.
jdbc_client: results flow through the driver (simpler, slower).jdbc_cluster: HS2 streams results to executors (better for larger results).
Secure access mode
secure_access executes the query in HS2 and stages the results in a temporary directory using a CTAS
workflow. Spark then reads the staged ORC data. This is the recommended mode for Ranger-protected
clusters.
Requirements:
- Set
spark.datasource.hive.warehouse.load.staging.dirto a fully qualified URI. - Ensure the staging directory has proper permissions.
Notes:
- Spark UDFs are not supported because the query is executed by Hive.
- Reads can fail with lock errors if another transaction holds an exclusive lock; retry after commit.
- Cache can be disabled with
spark.hadoop.secure.access.cache.disable=true.