EECS 485 Lab
Lab 10: Project 5 Setup
Goals
- Set up local development environment, version control, and Python virtual environment
- Set up skeleton code for project 5, a Wikipedia Search Engine
Project 5 Setup
Lab 10 is the Project 5 setup tutorial. Stop when you get to “Inverted Index with MapReduce”.
Completion Criteria
- Have a clean remote and local git repo with a
.gitignore
, starter files, and Python package skeleton (__init__.py
and__main__.py
)Check
$ pwd /Users/awdeorio/src/eecs485/p5-search-engine $ tree --matchdirs -I 'env|__pycache__|node_modules' . ├── bin │ ├── index │ ├── indexdb │ └── search ├── hadoop │ ├── hadoop-streaming-2.7.2.jar │ ├── inverted_index │ │ ├── input.txt │ │ ├── input_split.py │ │ ├── map0.py │ │ ├── map1.py │ │ ├── ... │ │ ├── output_sample.txt │ │ ├── pipeline.sh │ │ ├── reduce0.py │ │ ├── reduce1.py │ │ ├── ... │ │ └── stopwords.txt │ └── word_count │ ├── input │ │ ├── file01 │ │ └── file02 │ ├── map.py │ └── reduce.py ├── index │ ├── index │ │ ├── __init__.py │ │ ├── api │ │ │ └── *.py │ │ ├── inverted_index.txt │ │ ├── pagerank.out │ │ └── stopwords.txt │ └── setup.py ├── search │ ├── package-lock.json │ ├── package.json │ ├── search │ │ ├── __init__.py │ │ ├── api │ │ │ └── *.py │ │ ├── config.py │ │ ├── js │ │ │ └── *.jsx │ │ ├── sql │ │ │ └── wikipedia.sql │ │ ├── static │ │ │ └── js │ │ │ └── bundle.js │ │ ├── templates │ │ │ └── *.html │ │ ├── var │ │ │ └── wikipedia.sqlite3 │ │ └── views │ │ └── *.py │ ├── setup.py │ └── webpack.config.js └── tests
- Have an activated Python virtual environment (
source env/bin/activate
to activate)Check
$ pwd /Users/awdeorio/src/eecs485/p5-search-engine $ source env/bin/activate $ which python /Users/awdeorio/src/eecs485/p5-search-engine/env/bin/python
- Be able to run the “Hello World” with Hadoop
Check
$ pwd /Users/awdeorio/src/eecs485/p5-search-engine/hadoop/word_count $ hadoop \ jar ../hadoop-streaming-2.7.2.jar \ -input input \ -output output \ -mapper ./map.py \ -reducer ./reduce.py Starting map stage + ./map.py < output/hadooptmp/mapper-input/part-00000 > output/hadooptmp/mapper-output/part-00000 + ./map.py < output/hadooptmp/mapper-input/part-00001 > output/hadooptmp/mapper-output/part-00001 Starting group stage + cat output/hadooptmp/mapper-output/* | sort > output/hadooptmp/grouper-output/sorted.out Starting reduce stage + ./reduce.py < output/hadooptmp/grouper-output/part-00000 > output/hadooptmp/reducer-output/part-00000 + ./reduce.py < output/hadooptmp/grouper-output/part-00001 > output/hadooptmp/reducer-output/part-00001 Output directory: output
- Register your group on the project 5 autograder.
Lab Quiz
Complete the lab quiz by the due date.