Version: 1.3.1.0

Read Modes

HWC supports several read modes. Choose a mode based on security requirements, performance, and data size.

Configure with:

spark.datasource.hive.warehouse.read.mode=secure_access
spark.datasource.hive.warehouse.read.jdbc.mode=cluster

Direct reader (LLAP)

direct_reader_v1 and direct_reader_v2 read ORC data directly via LLAP without routing queries through HS2. This is the fastest option but does not enforce HS2/Ranger policies. Use it for trusted ETL pipelines.

Key characteristics:

Reads a consistent snapshot of a single table at query time.
Requires HDFS permissions to access table data.
Does not apply Hive authorization (Ranger) at HS2.
Does not support writes or streaming inserts.

JDBC mode

jdbc_cluster and jdbc_client send queries to HS2 and return results to Spark. Use JDBC when you need HS2 authorization enforcement and query semantics provided by Hive.

jdbc_client: results flow through the driver (simpler, slower).
jdbc_cluster: HS2 streams results to executors (better for larger results).

Secure access mode

secure_access executes the query in HS2 and stages the results in a temporary directory using a CTAS workflow. Spark then reads the staged ORC data. This is the recommended mode for Ranger-protected clusters.

Requirements:

Set spark.datasource.hive.warehouse.load.staging.dir to a fully qualified URI.
Ensure the staging directory has proper permissions.

Notes:

Spark UDFs are not supported because the query is executed by Hive.
Reads can fail with lock errors if another transaction holds an exclusive lock; retry after commit.
Cache can be disabled with spark.hadoop.secure.access.cache.disable=true.

Read Modes

Direct reader (LLAP)​

JDBC mode​

Secure access mode​

Direct reader (LLAP)

JDBC mode

Secure access mode