5 Hive
BootcampBigdata2020-12-17
What is Hive?
- is an open-source data warehouse system
- built on top of Hadoop for querying and analyzing large datasets
- is a NOT relational database
- is NOT designed for online transaction progress
- Access to data via SQL-like queries (HiveQL)
- Data summarization and aggregation
- Analysis
Hive is an open-source data warehouse system built on top of Hadoop for querying and analyzing large datasets. Hive abstracts the complexity of Hadoop. It provides easy to use SQL-like syntax called HiveQL, and enables users to do ad-hoc querying, summarization and data analysis. Hive implicitly converts HiveQL statements into a directed acyclic graph (ζεζ η―εΎ) of MapReduce, Tez, or Spark jobs, which are submitted to Hadoop for execution.
Hive is more suitable for traditional data warehousing tasks.
-
Metastore could be configured as:
- Embedded (derby DB)
- Local
- Remote
- Metastore DB: RDBS - MySQL, Oralce, Postgres, MS-SQL.
Data Units
1. Partitions
2. Buckets
- Sampling ιζ ·
