Hadoop a Computing Network used in DxChain

in #bigdata6 years ago

The DxChain Network is designed to serve as a data trading platform for users who want to sell their data. Its world’s first decentralized big data and machine learning network which incorporates Hadoop, the industry proven big data platform as its computation engine.

Hadoop is an open source distributed processing framework that manages storage for big data applications and data processing running in clustered systems. It is at the center of a growing ecosystem of big data technologies where target is to use that data for machine learning applications and predictive analysis. Hadoop can handle various forms of structured and unstructured data, giving users more flexibility for collecting, processing and analysing data as compared to relational databases and data warehouses provide.

Hadoop runs on clusters of commodity servers and can scale up to support thousands of hardware nodes and massive amounts of data. It uses a namesake distributed file system that’s designed to provide rapid data access across the nodes in a cluster. It has the capability to provide fault tolerance if individual nodes fails so that application can continue to run.

The core components of Hadoop are Hadoop Distributed File System (HDFS), Yet Another Resource Negotiator (YARN), MapReduce and Hadoop Common. HDFS is a file system that manages storage of and access to data distributed across the various nodes of a Hadoop cluster. YARN is Hadoop’s cluster resource manager, responsible for allocating system resources to applications and scheduling jobs.

MapReduce is the heart of Hadoop. It’s a programming framework and processing engine used to run large-scale batch applications in Hadoop systems. It allows massive scalability across hundreds of servers in Hadoop. It is comprised of two steps. Map step is a master node that takes inputs and partitions them into smaller subproblems and then distributes them to worker nodes. After the map step has taken place, the master node takes the answers to all of the subproblems and combines them to produce output. Hadoop Common is a set of utilities and libraries that provide underlying capabilities required by the other pieces of Hadoop.

Hadoop has a huge collection of tools which could help run task easily such as HBase, Spark, Pig and Hive. HBase and Spark are not used by DxChain due to their latency and high memory machine respectively. Pig performs data analysis by providing high level language. It supports database operations and analysis for non-structured data. If the data is stored as a plain text, Pig is a perfect tool to parse and analyse it on the fly. Hive facilitates reading, writing and managing large datasets residing in distributed storage using SQL.

DxChain Network supports database schema and SQL is used to do business intelligence related operations. Since Pig and Hive do not have requirements of time-sensitivity and high memory machines they are good applications for DxChain Network. The DxChain team intends to develop DPig and DHive to facilitate the computation running on DxChain Network. More projects will potentially be coming.

Use my referral link: https://t.me/DxChainBot?start=n0b5wh-n0b5wh

DxChain Website: https://www.dxchain.com/

Sort:  

@limkokoi65, I gave you an upvote on your post! Please give me a follow and I will give you a follow in return and possible future votes!

Thank you in advance!

Congratulations @limkokoi65! You have completed the following achievement on Steemit and have been rewarded with new badge(s) :

You made your First Vote
Award for the number of upvotes
Award for the number of upvotes received

Click on the badge to view your Board of Honor.
If you no longer want to receive notifications, reply to this comment with the word STOP

Do you like SteemitBoard's project? Then Vote for its witness and get one more award!

Coin Marketplace

STEEM 0.20
TRX 0.13
JST 0.030
BTC 65733.39
ETH 3506.40
USDT 1.00
SBD 2.51