This month Hortonworks has released the latest version of hortonworks dataflow version 1.1. Hoetonworks DAta Platform popularly called HDP is the major project and product of hortonworks that is built on top of open-source hadoop ecosystem.
Now, do hortonwoks data flow and data paltform represent the same?
No. HDP the hortonworks data platform is the bundled version of open-source hadoop in a packaged format. Using a installer all the components that form part of hadoop project are chosen and bundled correctly. As the many different components in hadoop ecosystem have different version releases at different point-in-time and compatability is not always guaranteed, HDP is a stable solution for enterprises looking to have hadoop implemented as a customized, stable, tested package that is installed using installer
Hadoop Dataflow on other hand is Apache nifi. This is the GUI tool used to design the dataflows using processors which are data extracting engines designed to work with many different datasources. Hadoop is meant for its data enrinchment. As such there are around 90 processors in HDP that can getfiles from local file system, extract information from twitter etc. This information can be put into HDFS the hadoop distributed filesystem and dataflow is designed using relationships. Once the drag and drop of the processors is done in GUI, appropriate properties are configured, relation ship is established and built appropriately dataflow gets initiated.
As such HDF is for designing dataflow, HDP is the apache hadoop platform supporting enterprise big data projects starting with its HDFS the hadoop distributed file system
Apacha nifi is the data ingestion tool that has been customized as Hortonworks dataflow. Processor is the basic component that helps with collecting, aggregating correct information to be processed, pushed onto HDFS. There are more than 90 processors as of date that come as integral part of Apache Nifi. It would be better if we have a easy method to locate the correct processor
1) Drag and drop the processor icon from apache Nifi web user interface
2) Click on tags to locate the processor based on usage. Say, tag ingest is going to get list of processors that start with get
3) Type the processor name in search box and add them
Hortonworks dataflow the GUI that helps in collecting, conducting and curating the distributed data from structured, unstructured data sources and pushing onto enriched Hadoop has its latest version Hortonworks dataflow 1.1.1 released and available for download. This software has been released on January 3rd,2016 and can be downloaded from the following link for download
Hortonworks dataflow is powered by Apache nifi and has same GUI as apache nifi. Also this is a integrated platform that helps with transporting data from as small as twitter feeds onto HDFS to as big as a datasource that has continuous streaming of information. Also, data is transported in a secured fashion.
The basic component that comes out of the box as part of Apache nifi is the processor. This is the core component of dataflow. It is possible to choose appropriate processor from among the 90 processors by simple search using name, tag in search box. Drag and drop processor from GUI that is accessed in port 8080. The incoming information or data is referred to as flowfile. Typically relationship is established between processor and datatore which happens to be HDFS in a hadoop environment