Performance and Energy Optimization on Terasort Algorithm by Task Self-Resizing
In applications of MapReduce, Terasort undoubtedly is the most dazzling star, which helps Hadoop win the first price of Sort Benchmark. While Terasort is known for its sorting speed on big data, its performance and energy consumption still can be optimized. We analyze the characteristics of Terasort and find that there exists some nodes’ idleness, which will not only waste energy but also lose performance. Therefore, we optimize Terasort by a task self-resizing algorithm, which can save time and energy consumed by map node waiting for tasks and reducer node waiting for inputs. The algorithm proposed in this paper has been proved effective in optimizing performance and energy consumption through a series of experiments, and it also could be adapted to other applications in MapReduce environment.