• Bootcamp (9)
    • 📱 236 - 992 - 3846

      📧 jxjwilliam@gmail.com

    • Version: ‍🚀 1.1.0
  • Hadoop

    BootcampBigdata22020-12-17


    algorithm 算法

    Hadoop

    • Flume: unstructured, semi-structured data
    • Sqoop: structured data
    • HDFS: Storage
    • YARN: Resource Management
    • Spark: In-Memory, Data Flow Engine
    • STORM: KAFKA & STORM Streaming
    • Solr & Lucene: Searching & Indexing
    • OOZIE: Scheduling
    • MAPREDUCE: Processing using different languages
    • HIVE & DRILL: Analytical SQL-on-Hadoop
    • MAHOUT & Spark MLlib: Machine learning
    • PIG: Scripting
    • HBASE: NoSQL Database
    • ZooKeeper & Ambari: Management & Coordination

    Hadoop vs Spark

    1. Performance
    2. hadoop:
    3. spark:
    4. Ease of Use
    5. hadoop:
    6. spark:
    7. Costs
    8. hadoop:
    9. spark:
    10. Data Processing]
    11. hadoop: batch processing
    12. spark: stream processing
    13. Fault Tolerance
    14. hadoop: replication, ex-execution of job
    15. spark: RDD is automatically recomputed by using the original transformations
    16. Security
    17. hadoop: Server LDAP
    18. spark: public/private key, secrets

    Use-cases

    1. hadoop
    2. spark Graph Processing, Iterative Processing, Applications requiring Stream-Proing

    commands

    Get directory listing of user home directory in hdfs

    $ hadoop fs -ls

    Get directory listing of root directory

    $ hadoop fs -ls /

    Copy a local file to hdfs (user home directory)

    $ hadoop fs -copyFromLocal foo.txt foo.txt

    Move a file from hdfs to local disk

    $ hadoop fs -copyToLocal /user/geert/foo.txt foo.txt

    Display the contents of a file

    $ hadoop fs -cat /data/as400/customers.txt

    Make a new directory (user home directory)

    $ hadoop fs -mkdir output