OVERVIEW
keywords
Distributed System: is one in which components located at networked computers communicate and coordinate their actions only by passing messages. DSM: Distributed Shared Memory DFS: Distributed File System: operating remote files like local stroage Distributed transactions
key concerns for distributed system:
concurrency control availability scalability reliability and fault tolerrance
key concerns for DFS:
naming cache (writing policy; cache consistency) semantics (read returns the data due to latest write operation)
HDFS
HDFS doesn’t support random write, files in HDFS are ‘write once’, and HDFS is optimized for large, streaming reads of files rather than random reads,
HDFS’s write-once-read-many model that relaxes concurrency control requirements, simplifies data coherency, and enables high-throughput access. – https://hadoop.apache.org/docs/r1.2.1/hdfs_design.html
key features of HDFS
replica rack-awareness write-once, read many computation closer to data (bring program to data) multi clients supprot (fs shell, java api for spark, flume, hue …)
Comments