Distributed System: is one in which components located at networked computers communicate and coordinate their actions only by passing messages. DSM: Distributed Shared Memory DFS: Distributed File System: operating remote files like local stroage Distributed transactions

key concerns for distributed system:

concurrency control availability scalability reliability and fault tolerrance

key concerns for DFS:

naming cache (writing policy; cache consistency) semantics (read returns the data due to latest write operation)


HDFS doesn’t support random write, files in HDFS are ‘write once’, and HDFS is optimized for large, streaming reads of files rather than random reads,

HDFS’s write-once-read-many model that relaxes concurrency control requirements, simplifies data coherency, and enables high-throughput access. – https://hadoop.apache.org/docs/r1.2.1/hdfs_design.html

key features of HDFS

replica rack-awareness write-once, read many computation closer to data (bring program to data) multi clients supprot (fs shell, java api for spark, flume, hue …)


A comic-like explanation of HDFS