## CSci 3501 Algorithms and Computability - Lab 12.

#### Due Wednesday, December 9 at 11:59pm

What to submit and when:

• All submissions are electronic: by e-mail to and CC to all lab partners. Please do not delete your e-mail from "Sent mail" or your mailbox until the end of the semester.
• When working on the lab, please comment your work so that it is clear what contributions of each person are.
• At the end of the lab each group should send me the results of their in-class work. Please indicate if this is your final submission.
• If your submission at the end of the lab time was not final, please send me(CC to the lab partner(s)) a final copy before the due time. Please use the subject "3501 Lab N", where N is the lab number.

### Lab assignment

Work in pairs

#### Lab overview and goals

The goal of the lab is to explore Huffman encoding as a greedy algorithm.

#### Implementing Huffman encoding (30 points)

Huffman encoding is an approach to data encoding that minimizes storage by assigning shorter codes to more frequent letters and longer codes to more frequent ones. The encoding is known as prefix-free which means that no letter code is a prefix of any other one. For instance, if the most frequent letter is encoded as 0 then no other letter code starts with 0. Thus decoding is unambiguous at every point. Huffman codes are described in section 16.3 (p. 428) in CLRS and on the wikipedia page https://en.wikipedia.org/wiki/Huffman_coding Huffman encoding is an optimal encoding (i.e. it produces the shortest encoding of 0s and 1s) for a text that follows the specified letter frequencies.

As an example, consider an alphabet of 5 letters with frequencies given below:

 Letter: a b c d e Frequency, %: 51 22 15 7 5

The algorithms for finding Huffman code forms a binary tree in which nodes represent either a single letter or a combination of letters. Initially all letters are placed in a min-priority queue, where the key is the frequency. The algorithms proceeds by combining two nodes with the least frequencies into one that has the combined frequency, and adding it back into the queue. The process continues until there is only one combined node which is the root of the trree. Then each left branch in the tree is denoted by 0 and the right branch by 1. The codes for each letter are the paths to them in the tree.
The textbook or the compression section of the wikipedia page provide more details.

The encoding for the example above is:

 Letter: a b c d e Encoding: 1 00 011 0101 0100

As an exercise, encode the strings ace and add according to the above encoding. Then decode the strings 101011 and 0010101. Note how prefix-free encoding makes decoding easy.

Your goal is to write a program that reads in an alphabet with letter frequencies and produces Huffman codes for this letters. It then allows you to encode and decode texts in this alpahbet using the Huffman encoding you have constructed. More specifically:

1. Your program must read data in the format (on one line):
a 51 b 22 c 15 d 7 e 5

The frequences are assumed to be in decreasing order and add up to 100%. New line signals end of input.
2. It should put output encodings for each letter.
3. It should then take inputs in the format: