• Bootcamp (9)
    • 📱 236 - 992 - 3846

      📧 jxjwilliam@gmail.com

    • Version: ‍🚀 1.1.0
  • 3 Hdfs

    BootcampBigdata2020-12-17


    HDFS

    1. NameNode

    Master, Metadata

    • DataNodes NameNode dameon must be running at all times.
    • If the NameNode stops, the cluster becomes inaccessible
    • NameNode stores all metadata
    • Metadata = fs image + edit log / written in memory
    • fs image = paths + block ids + usr + group + permissions
    • edit log = operations / written in disk

    NameNode itself Secondary namenode Standby namenode

    1. DataNode Slave, read/write

    hdfs architecture

    Namenode

    • NameNode daemon must be running at all times
    • If the Namenode stops, the cluster becomes inaccessible
    • The Namenode stores all metadata

      • file locations in HDFS
      • file ownership and permissions
      • names of the individual blocks
      • locations of the blocks
    • Metadata = fs image + edit log / written in memory
    • fs image = paths + block ids + user + group + permissions / written in disk
    • edit log = operations / written in disk

    high availablity 1

    Hadoop CLI

    • hadoop
    • helm

    Data Units

    • Databases
    • Tables
    • Partitions
    • Buckets

    File formats

    • Text File
    • Sequence File
    • AVRO File
    • RC File
    • ORC File
    • Paquet File
    • Custom INPUTFORMAT and OUTPUTFORMAT

    partition directories bucket files