HDFS - Hadoop Distributed File System is a collection of open-source software utilities that facilitate using a network of many computers to solve problems involving massive amounts of data and computation. Repository - https://git-wip-us.apache.org/repos/asf?p=hadoop.git
Next: Validation Example