• Bootcamp (9)
    • πŸ“± 236 - 992 - 3846

      πŸ“§ jxjwilliam@gmail.com

    • Version: β€πŸš€ 1.1.0
  • 5 Hive

    BootcampBigdata2020-12-17


    What is Hive?

    • is an open-source data warehouse system
    • built on top of Hadoop for querying and analyzing large datasets
    • is a NOT relational database
    • is NOT designed for online transaction progress
    • Access to data via SQL-like queries (HiveQL)
    • Data summarization and aggregation
    • Analysis

    Hive is an open-source data warehouse system built on top of Hadoop for querying and analyzing large datasets. Hive abstracts the complexity of Hadoop. It provides easy to use SQL-like syntax called HiveQL, and enables users to do ad-hoc querying, summarization and data analysis. Hive implicitly converts HiveQL statements into a directed acyclic graph (ζœ‰ε‘ζ— ηŽ―ε›Ύ) of MapReduce, Tez, or Spark jobs, which are submitted to Hadoop for execution.

    Hive is more suitable for traditional data warehousing tasks.

    Hive Architecure

    Hive components

    1. Metastore could be configured as:

      • Embedded (derby DB)
      • Local
      • Remote
    2. Metastore DB: RDBS - MySQL, Oralce, Postgres, MS-SQL.

    Hive components

    Data Units

    1. Partitions

    2. Buckets

    Hive Supported File Formats

    Hive types

    • Sampling 采样