What to submit for hadoop exercise:

  1. Problem statement (describe what problem you are truying to solve).
  2. Detailed description of data (format, number of files, etc), data sample.
  3. All your code and documentation explaining key/value pairs and all ordering that happens in the process.
  4. The exact commands that you used to load the data files and run your program
  5. The entire output file; explain its format.
  6. Any observations that you made in the process of working on the problem, any difficulties you ran into. Was your problem well suited for Hadoop? If not, why?

