1.Distributed and Parrallel computing
A distributed system is one in which components located at networked computers communicate and coordinate their actions only by passing messages.
Distributed Shared Memory (DSM)
Distributed File System (DFS)
Parallel Computing Systems(subset of Concurrency Computing)
- In parallel computing, all processors may have access to a shared memory to exchange information between processors.
- In distributed computing, each processor has its own private memory (distributed memory). Information is exchanged by passing messages between the processors
Apache Hadoop = HDFS+MapReduce, Apache Hadoop is not designed for real time processing but for batch processing Hadoop ecosystem includes more than Apache Hadoop MapReduce, it refers to the various components of the Apache Hadoop software library, as well as to the accessories and tools provided by the Apache Software Foundation for these types of software projects, and to the ways that they work together.
2.1 Data Storage
Column Family: HBASE
Graphy neo4j flockDB InfiniteGraphy
2.2 Data Ingestion
normal batch mode
2.3 Data Processing
MapReduce( v2 shipped with YARN) SPARK