Apache Tez

A Framework for YARN-based, Data Processing Applications In Hadoop

Apache™ Tez is an extensible framework for building high performance batch and interactive data processing applications, coordinated by YARN in Apache Hadoop. Tez improves the MapReduce paradigm by dramatically improving its speed, while maintaining MapReduce’s ability to scale to petabytes of data. Important Hadoop ecosystem projects like Apache Hive and Apache Pig use Apache Tez, as do a growing number of third party data access applications developed for the broader Hadoop ecosystem.

The Apache Tez™ project is aimed at building an application framework which allows for a complex directed-acyclic-graph of tasks for processing data. It is currently built atop Apache Hadoop YARN.

The 2 main design themes for Tez are:

Empowering end users by

• Expressive dataflow definition APIs

• Flexible Input-Processor-Output runtime model

• Data type agnostic

• Simplifying deployment

Execution Performance

• Performance gains over Map Reduce

• Optimal resource management

• Plan reconfiguration at runtime

• Dynamic physical data flow decisions

added 9 years 8 months ago

Contents related to 'Apache Tez'

Apache Hadoop: Apache Hadoop is an open-source software framework for distributed storage and distributed processing of very large data sets on computer clusters built from commodity hardware.