p4-mapreduce
heapq
This tutorial will show you how to use the heapq.merge()
function. It’s useful for implementing the Reduce Stage in MapReduce.
Example
The heapq.merge()
function is part of the Python standard library that implements merge functionality similar to merge sort. It merges multiple sorted inputs into a single sorted output. We have provided an example use case below.
Input
Given two sorted files:
sample1.txt
a 1
a 1
b 2
c 3
sample2.txt
a 2
c 2
d 3
Program
We can merge sample1.txt
and sample2.txt
into a single sorted output like this sample.py
.
import heapq
# Make a list of open file handles
files = []
files.append(open("sample1.txt"))
files.append(open("sample2.txt"))
# Merge
# heapq.merge(*files) is equivalent to heapq.merge(files[0], files[1])
for line in heapq.merge(*files):
print(line)
The sample code is not pylint-clean because open
should be used inside a context manager (with
). The best way to handle multiple open files is with contextlib.ExitStack
.
Output
Run our program and notice that the output is sorted.
$ python3 sample.py
a 1
a 1
a 2
b 2
c 2
c 3
d 3
Acknowledgments
Original document written by Zachary Papanastasopoulos and Andrew DeOrio awdeorio@umich.edu.
This document is licensed under a Creative Commons Attribution-NonCommercial 4.0 License. You’re free to copy and share this document, but not to sell it. You may not share source code provided with this document.