3 Hdfs
BootcampBigdata2020-12-17
HDFS
- NameNode
Master, Metadata
- DataNodes NameNode dameon must be running at all times.
- If the NameNode stops, the cluster becomes inaccessible
- NameNode stores all metadata
- Metadata = fs image + edit log / written in memory
- fs image = paths + block ids + usr + group + permissions
- edit log = operations / written in disk
NameNode itself Secondary namenode Standby namenode
- DataNode Slave, read/write
Namenode
- NameNode daemon must be running at all times
- If the Namenode stops, the cluster becomes inaccessible
-
The Namenode stores all metadata
- file locations in HDFS
- file ownership and permissions
- names of the individual blocks
- locations of the blocks
- Metadata = fs image + edit log / written in memory
- fs image = paths + block ids + user + group + permissions / written in disk
- edit log = operations / written in disk
Hadoop CLI
- hadoop
- helm
Data Units
- Databases
- Tables
- Partitions
- Buckets
File formats
- Text File
- Sequence File
- AVRO File
- RC File
- ORC File
- Paquet File
- Custom INPUTFORMAT and OUTPUTFORMAT
partition directories bucket files
