Hadoop
BootcampBigdata22020-12-17
algorithm 算法
Hadoop
- Flume: unstructured, semi-structured data
- Sqoop: structured data
- HDFS: Storage
- YARN: Resource Management
- Spark: In-Memory, Data Flow Engine
- STORM: KAFKA & STORM Streaming
- Solr & Lucene: Searching & Indexing
- OOZIE: Scheduling
- MAPREDUCE: Processing using different languages
- HIVE & DRILL: Analytical SQL-on-Hadoop
- MAHOUT & Spark MLlib: Machine learning
- PIG: Scripting
- HBASE: NoSQL Database
- ZooKeeper & Ambari: Management & Coordination
Hadoop vs Spark
- Performance
- hadoop:
- spark:
- Ease of Use
- hadoop:
- spark:
- Costs
- hadoop:
- spark:
- Data Processing]
- hadoop: batch processing
- spark: stream processing
- Fault Tolerance
- hadoop: replication, ex-execution of job
- spark: RDD is automatically recomputed by using the original transformations
- Security
- hadoop: Server LDAP
- spark: public/private key, secrets
Use-cases
- hadoop
- spark Graph Processing, Iterative Processing, Applications requiring Stream-Proing
commands
Get directory listing of user home directory in hdfs
$ hadoop fs -lsGet directory listing of root directory
$ hadoop fs -ls /Copy a local file to hdfs (user home directory)
$ hadoop fs -copyFromLocal foo.txt foo.txtMove a file from hdfs to local disk
$ hadoop fs -copyToLocal /user/geert/foo.txt foo.txtDisplay the contents of a file
$ hadoop fs -cat /data/as400/customers.txtMake a new directory (user home directory)
$ hadoop fs -mkdir output 