William Blogs & More

About Services Gitbook Bootcamp Blogs

Bootcamp (9)

Ge Help

📱 236 - 992 - 3846

📧 jxjwilliam@gmail.com

Version

Version: ‍🚀 1.1.0

Hadoop

BootcampBigdata22020-12-17

algorithm 算法

Hadoop

Flume: unstructured, semi-structured data
Sqoop: structured data
HDFS: Storage
YARN: Resource Management
Spark: In-Memory, Data Flow Engine
STORM: KAFKA & STORM Streaming
Solr & Lucene: Searching & Indexing
OOZIE: Scheduling
MAPREDUCE: Processing using different languages
HIVE & DRILL: Analytical SQL-on-Hadoop
MAHOUT & Spark MLlib: Machine learning
PIG: Scripting
HBASE: NoSQL Database
ZooKeeper & Ambari: Management & Coordination

Hadoop vs Spark

Performance
hadoop:
spark:
Ease of Use
hadoop:
spark:
Costs
hadoop:
spark:
Data Processing]
hadoop: batch processing
spark: stream processing
Fault Tolerance
hadoop: replication, ex-execution of job
spark: RDD is automatically recomputed by using the original transformations
Security
hadoop: Server LDAP
spark: public/private key, secrets

Use-cases

hadoop
spark Graph Processing, Iterative Processing, Applications requiring Stream-Proing

commands

Get directory listing of user home directory in hdfs

$ hadoop fs -ls

Get directory listing of root directory

$ hadoop fs -ls /

Copy a local file to hdfs (user home directory)

$ hadoop fs -copyFromLocal foo.txt foo.txt

Move a file from hdfs to local disk

$ hadoop fs -copyToLocal /user/geert/foo.txt foo.txt

Display the contents of a file

$ hadoop fs -cat /data/as400/customers.txt

Make a new directory (user home directory)

$ hadoop fs -mkdir output