CSci 3501 Algorithms and Computability - Lab 11.

Due Wednesday, December 6 at 11:59pm

What to submit and when:

Lab assignment

Work in pairs

Lab overview and goals

The goal of the lab is to get practice with push-down automata, the pumping lemma for context-free languages, and Turing machines.
For the second part of the lab the goal is to explore Huffman encodings.

Task 1: push-down automata (6 points)

Please refer to Constructing a push-down automaton section in JFLAP.

  1. A pushdown automaton for the language of strings a^k followed by any number of b followed by c^k (do not convert your grammar from the previous question into an automaton or vice versa)
  2. A pushdown automaton for the language of strings a^n b^m where n <= m.

Task 2: Play the context-free "pumping lemma game" (7 points)

Use the tutorial for the pumping lemma. Play the "pumping lemma game" for the following examples. For each example state whether the language is context-free; justify it based on which side has a winning strategy in the pumping lemma game. Clearly describe the strategy.

Task 3: Turing machines (12 points)

Use JFLAP tutorial on Turing machines to construct and test a Turing machine that decides the language a^n b^n c^n, where n >= 0.

Task 4: Implementing Huffman encoding (30 points)

Huffman encoding is an approach to data encoding that minimizes storage by assigning shorter codes to more frequent letters and longer codes to more rare ones. The encoding is known as prefix-free which means that no letter code is a prefix of any other one. For instance, if the most frequent letter is encoded as 0 then no other letter code starts with 0. Thus decoding is unambiguous at every point. Huffman codes are described in section 16.3 (p. 428) in CLRS and on the wikipedia page https://en.wikipedia.org/wiki/Huffman_coding Huffman encoding is an optimal encoding (i.e. it produces the shortest encoding of 0s and 1s) for a text that follows the specified letter frequencies.

As an example, consider an alphabet of 5 letters with frequencies given below:

Letter: a b c d e
Frequency, %: 51 22 15 7 5

The algorithms for finding Huffman code forms a binary tree in which nodes represent either a single letter or a combination of letters. Initially all letters are placed in a min-priority queue, where the key is the frequency. The algorithms proceeds by combining two nodes with the least frequencies into one that has the combined frequency, and adding it back into the queue. The process continues until there is only one combined node which is the root of the trree. Then each left branch in the tree is denoted by 0 and the right branch by 1. The codes for each letter are the paths to them in the tree.
The textbook or the compression section of the wikipedia page provide more details.

The encoding for the example above is:

Letter: a b c d e
Encoding: 1 00 011 0101 0100

As an exercise, encode the strings ace and add according to the above encoding. Then decode the strings 101011 and 0010101. Note how prefix-free encoding makes decoding easy.

Task for this part of the lab

Your goal is to write a program that reads in an alphabet with letter frequencies and produces Huffman codes for this letters. It then allows you to encode and decode texts in this alpahbet using the Huffman encoding you have constructed. More specifically:

  1. Your program must read data in the format (on one line):
          a 51 b 22 c 15 d 7 e 5
        
    The frequences are assumed to be in decreasing order and add up to 100%. New line signals end of input.
  2. It should output encodings for each letter.
  3. It should then take inputs in the format:
          e add
        
    Where the first letter is "e" for "encode" and "d" for "decode", and the following word is what you would like to encode or decode. Spaces and other symbols outside the alphabet should be ignored.
  4. Entering letter x on the line by itself terminates the program.

When submitting your program, please explain what its efficiency is for finding Huffman codes (where n is the number of letters in the alphabet). Don't forget to include the efficiency of data structures that you are using, if any.

What to submit


CSci 3501 course web site.

The views and opinions expressed in this page are strictly those of the page author. The contents of this page have not been reviewed or approved by the University of Minnesota.