correcting note on static file hosting for dev

[oweals/karmaworld.git] / README.md
diff --git a/README.md b/README.md

index 7c592aab7e2cb9890889a0bb3f4de3b2cf5ba049..70034ec16af1e5fdbe10555f906d85d6fc796600 100644 (file)
--- a/README.md
+++ b/README.md
@@ -11,7 +11,14 @@ v3.0 of the karmanotes.org website from the FinalsClub Foundation
  
  # Purpose
  
-KarmaNotes is an online database of college lecture notes.  KarmaNotes empowers college students to participate in the free exchange of knowledge. 
+KarmaWorld is an online database of college lecture notes.  KarmaWorld
+empowers college students to participate in the free exchange of knowledge.
+
+# Naming
+
+The repository and the project are called KarmaWorld. One implementation
+of KarmaWorld, which is run by FinalsClub Foundation, is called
+[KarmaNotes](https://www.karmanotes.org/).
  
  # Pre-Installation
  
@@ -38,63 +45,298 @@ directory underneath that (`{project_root}/karmaworld`) alongside files like
  `fabfile.py` (`{project_root}/fabfile.py`) and `README.md`
  (`{project_root}/README.md`).
  
+## External Software Dependencies
+
+### pdf2htmlEX
+
+KarmaWorld uses [pdf2htmlEX](https://github.com/coolwanglu/pdf2htmlEX) as
+a dependency. pdf2htmlEX is used to convert uploaded PDF notes into HTML.
+
+An [outdated version of pdf2htmlEX](https://github.com/FinalsClub/pdf2htmlEX)
+is available which includes the
+[patch](https://github.com/FinalsClub/pdf2htmlEX/commit/3c19f6abd8d59d1b58cf254b7160b332b3f5b517)
+required for pdf2htmlEX to correctly work with KarmaWorld.
+
+Newer versions can be used by applying the patch by hand. It's a fairly
+simple two-line modification that can be done after installing
+pdf2htmlEX.
+
+### SSL Certificate
+
+If you wish to host your system publicly, you'll almost certainly want
+an SSL certificate signed by a proper authority.
+
+You may need to set the `SSL_REDIRECT` environment variable to `true` to
+make KarmaWorld redirect insecure connections to secure ones.
+
+Follow [Heroku's SSL setup](https://devcenter.heroku.com/articles/ssl-endpoint)
+to get SSL running on your server with Heroku.
+
  ## External Service Dependencies
  
-Notice: This software makes use of external third party services which require
+Notice: A number of services are required even if running the KarmaWorld web
+service [locally](#local). Some of the services are recommended, and some are
+completely optional even if running the web service on Heroku.
+
+This software makes use of external third party services which require
  accounts to access the service APIs. Without these third parties available,
-this software may require considerable overhaul.
+this software may require considerable overhaul. These services have API keys,
+credentials, and other information that you must provide to KarmaWorld
+as environment variables.
+
+The best way to persist these API keys in environment variables is by using a
+`.env` file.  Copy `.env.example` to `.env` and populate the fields as required.
+
+Many of these services have free tiers and can be used without charge for
+development testing purposes.
+
+* Reminder
+  * Copy `.env.example` to `.env` and populate the environment variables there.
+* Required Services
+  * [Google Drive](#google-drive)
+  * [Filepicker](#filepicker)
+  * [PostgreSQL](#postgresql)
+  * [Celery](#celery-queue)
+* Optional but recommended
+  * [IndexDen](#indexden): enables searching through courses, notes, etc
+  * [Heroku](#heroku): the production environment used by karmanotes.org
+    * it might not be possible to run KarmaWorld on Heroku using a free
+      webapp.
+  * [Amazon S3](#s3-for-static-files): for static file hosting
+* Entirely optional (though used in the production environment)
+  * [Twitter](#twitter): share updates about new uploads
+  * [Amazon Mechanical Turk](#amazon-mechanical-turk): generate quizzes, flashcards, etc
+  * [Amazon CloudFront](#amazon-cloudfront-cdn)
+  * [Amazon S3](#s3-for-filepicker): store files uploaded to Filepicker
+    * Filepicker does not support S3 storage in its free tier
+
+### Heroku
+This project has chosen to use [Heroku](www.heroku.com) to host the Django and
+celery software. While not a hard requirement, the more up-to-date parts of this
+documentation will operate assuming Heroku is in use.
+
+See README.heroku for more information.
+
+#### pdf2htmlEX on Heroku
+If using Heroku, the default
+[KarmaNotes Heroku buildpack](https://github.com/FinalsClub/heroku-buildpack-karmanotes)
+will [include](https://github.com/FinalsClub/heroku-buildpack-karmanotes/blob/master/bin/steps/pdf2htmlex)
+the [required version of pdf2htmlEX](#pdf2htmlex).
+
+### Celery Queue
+Celery uses the Apache Message Queueing Protocol for passing messages to its workers.
+
+For production, we recommend using Heroku's CloudAMQP add-on, getting your own CloudAMQP account, or
+running a queueing system on your own. The `CLOUDAMQP_URL` environment variable must be set correctly
+for KarmaWorld to be able to use Celery. The `CELERY_QUEUE_NAME` environment variable
+must be set to the name of the queue you wish to use. Settings this to something unique
+allows multiple instances of KarmaWorld (or some other software) to share the same queueing server.
+
+For development on localhost, `RabbitMQ` is the default for `djcelery` and is well supported. Ensure
+`RabbitMQ` is installed for local development.
+
+### PostgreSQL
+
+PostgreSQL is not necessarily required; other RDBMS could probably be fit into
+place. However, the code was largely written assuming PostgreSQL will be used.
+Change to another system with the caveat that it might take some work.
+
+There are many cloud providers which provide PostgreSQL databases. Heroku has
+an add-on for providing a PostgreSQL database. Ensure something like this
+is made available and installed to the app.
+
+For local development, ensure a PostgreSQL is running on localhost or is
+otherwise accessible.
  
-### Filepicker
-This software uses [Filepicker.io](https://www.inkfilepicker.com/) for uploading
-files. This requires an account with Filepicker.
+### Amazon S3
+The instructions for creating an [S3](http://aws.amazon.com/s3/) bucket may be
+[found on Amazon.](http://docs.aws.amazon.com/AmazonS3/latest/gsg/CreatingABucket.html)
  
-Filepicker requires an additional third party file hosting site where it may
-send uploaded files. For this project, we have used Amazon S3.
+Two, separate buckets may be used in production: one for static file hosting
+and one as a communication bus with Filepicker.
  
-Filepicker will provide an API key. This is needed by the software.
+#### S3 for Filepicker
  
-### Amazon S3
+This software uses S3 to store files which are sent to or received 
+from Filepicker. Filepicker will need to know the S3 bucket name, access key,
+and secret key.
+
+Filepicker users can only make use of an S3 bucket with a paid account. For
+development purposes, no Filepicker S3 bucket is needed. Skip all references to
+the Filepicker S3 bucket in the development case.
+
+The software will not need to know the S3 credentials for the Filepicker
+bucket, because the software will upload files to the Filepicker S3 bucket
+through Filepicker's API and it will link to or download files from the
+Filepicker S3 bucket through Filepicker's URLs. This will be covered in the
+[Filepicker section](#filepicker).
+
+#### S3 for static files
+
+This software uses S3 for hosting static files. The software will need to
+update static files on the S3 bucket. As such, the software will need the
+S3 bucket name, access key, and secret key via the environment variables. This
+is described in subsections below.
+
+To support static hosting, `DEFAULT_FILE_STORAGE` should be set to
+`'storages.backends.s3boto.S3BotoStorage'`, unless there is a compelling reason
+to change it.
+
+There are three ways to setup access to the S3 buckets depending upon speed
+and security. The more secure, the slower it will be to setup.
+
+#### insecure S3 access
+For quick and dirty insecure S3 access, create a single group and a single user
+with full access to all buckets. Full access to all buckets is insecure!
+
+Create an 
+[Amazon IAM group](http://docs.aws.amazon.com/IAM/latest/UserGuide/Using_CreatingAndListingGroups.html)
+with full access to the S3 bucket. Select the "Amazon S3 Full Accesss" Policy
+Template.
+
+Create an
+[Amazon IAM user](http://docs.aws.amazon.com/IAM/latest/UserGuide/Using_SettingUpUser.html).
+Copy the credentials into the `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY`
+environment variables. Be sure to write down the access information, as it
+will only be shown once.
+
+#### secure S3 access
+For secure S3 access, two users will be needed. One with access to the
+Filepicker bucket and one with access to the static hosting bucket.
+
+Note: this might need to be modified to prevent creation and deletion of
+buckets?
+
+Create an 
+[Amazon IAM group](http://docs.aws.amazon.com/IAM/latest/UserGuide/Using_CreatingAndListingGroups.html)
+with full access to the S3 bucket. The quick way is to select the
+"Amazon S3 Full Accesss" Policy Template and replace `"Resource": "*"` with 
+`"Resource": "arn:aws:s3:::<static_bucket_name>"`.
+
+Create an
+[Amazon IAM user](http://docs.aws.amazon.com/IAM/latest/UserGuide/Using_SettingUpUser.html).
+Copy the credentials into the `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY`
+environment variables. Be sure to write down the access information, as it
+will only be shown once.
+
+Ensure the created user is a member of the group with access to the S3
+static files bucket.
+
+Repeat the process again, creating a group for the Filepicker bucket and
+creating a user with access to that group. These credentials will be passed
+on to Filepicker.
+
+#### somewhat secure S3 access
+Create two groups as described in the `secure S3 access` section above.
+
+Create a single user, save the credentials as described in the
+`insecure S3 access` section above, and pass the credentials on to Filepicker.
+
+Add the single user to both groups.
  
-#### for Filepicker
-This software uses [Amazon S3](http://aws.amazon.com/s3/) as a third party file
-hosting site. The primary use case is a destination for Filepicker files. The
-software won't directly need any S3 information for this use case; it will be
-provided directly to Filepicker.
+This is less secure because if your web server or Filepicker get compromised
+(so there are two points for potential failure), the single compromised
+user has full access to both buckets.
  
-#### for Static File hosting
-A secondary use case for S3 is hosting static files. The software will need to
-update static files on the S3 bucket. In this case, the software will need the
-S3 bucket name, access key, and secret key.
+### Amazon Cloudfront CDN
+[Cloudfront CDN](http://aws.amazon.com/cloudfront/) assists static file hosting.
  
-The code assumes S3 is used for static files in a production environment. To
-obviate the need for hosting static files through S3 (noting that it still might
-be necessary for Filepicker), a workaround was explained [in this Github ticket](https://github.com/FinalsClub/karmaworld/issues/192#issuecomment-30193617).
+Follow
+[Amazon's instructions](http://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/GettingStarted.html)
+to host static files out of the appropriate S3 bucket. Note that Django's static
+file upload process has been modified to mark static files as publicly
+assessible.
  
-That workaround is repeated here. Make the following changes to
-`{project_root}/karmaworld/settings/prod.py`:
+In the settings for the Cloudfront Distribution, copy the "Domain Name" from
+General settings and set `CLOUDFRONT_DOMAIN` to it. For example, `abcdefghij.cloudfront.net`.
  
-1. comment out everything about static_s3 from imports
-2. comment out storages from the `INSTALLED_APPS`
-3. change `STATIC_URL` to `'/assets/'`
-4. comment out the entire storages section (save for part of `INSTALLED_APPS` and `STATIC_URL`)
-5. add this to the nginx config:
+### Amazon Mechanical Turk
+Mechanical turk is employed to generate human feedback from uploaded notes.
+This service is helpful for generating flash cards and quizzes.
  
-    location /assets/ {
-        root /var/www/karmaworld/karmaworld/;
-    }
+This service is optional and it might cause unexpected charges when
+deployed.  If the required environment variable is not found,
+then no errors will occur and no mechanical turk tasks will be created, avoiding any unexpected
+costs.
+
+The `MTURK_HOST` environment variable is almost certainly
+`"mechanicalturk.amazonaws.com"`.
+
+The code will create and publish HITs on your behalf.
  
  ### Google Drive
  This software uses [Google Drive](https://developers.google.com/drive/) to
  convert documents to and from various file formats.
  
-A Google Drive service account with access to the Google Drive is required. Thismay be done with a Google Apps account with administrative privileges, or ask
+A Google Drive service account with access to the Google Drive is required.
+This may be done with a Google Apps account with administrative privileges, or ask
  your business sysadmin.
  
-These are the instructions to create a Google Drive service account:
-https://developers.google.com/drive/delegation
+Follow [Google's instructions](https://developers.google.com/drive/web/auth/web-server)
+to create a Google Drive service account. If using Google Apps, it is worth
+looking at [these instructions](https://developers.google.com/drive/delegation).
+
+Populate the `GOOGLE_USER` environment variable with the email address of the
+user whose Google Drive will be accessed. This is typically your own email
+address.
+
+Google Drive used to use p12 files by default. Now a new-style JSON file is
+downloaded by default when creating new credentials. Until the code has been
+[updated](https://github.com/FinalsClub/karmaworld/issues/437) to use the
+new-style JSON file, make sure to click the `Generate a new P12 key` button.
+
+While on the Credentials page (with the `Generate a new P12 key` button
+visible), note the Service account Email address. It will have a format like
+`numbers-alphanumerics@developer.gserviceaccount.com`. Copy this value and
+paste it into the `GOOGLE_SERVICE_EMAIL` environment variable.
+
+Convert the p12 file into a Base64 encoded string for the
+`GOOGLE_SERVICE_KEY_BASE64` environment variable. There are many ways to do
+this. If Python is available, the
+[binascii library](https://docs.python.org/2/library/binascii.html#binascii.b2a_base64)
+makes this very easy:
+
+        import binascii
+        with open('file.p12', 'r') as f:
+            print binascii.b2a_base64(f.read)
+
+### Filepicker
+This software uses [Filepicker](https://www.filepicker.com/) for uploading
+files. This requires an account with Filepicker.
+
+Filepicker can use an additional third party file hosting site where it may
+send uploaded files. This project, in production, uses Amazon S3 as the third
+party. See the Amazon S3 section above for more information.  
+
+In development, an S3 bucket will not be necessary. The Free Plan should
+suffice.
+
+Create a new App with Web SDK and provide the Heroku App URL for the
+Application's URL. You'll be given an API Key for the App. Paste this into the
+`FILEPICKER_API_KEY` environment variable.
+
+Find the 'App Security' button on the left hand side of the web site. Make sure
+'Use Security' is enabled. Generate a new app secret. It might require
+reloading the page to see the new secret. Paste this secret into the
+`FILEPICKER_SECRET` environment variable.
+
+If you have an upgraded plan, you can configure Filepicker to have access to
+your Filepicker S3 bucket. Click 'Amazon S3' on the left hand side menu and
+supply the credentials for the user with access to the Filepicker S3 bucket.
+
+### IndexDen
+KarmaWorld uses IndexDen to create a searchable index of all the notes in the
+system. Create an free IndexDen account at
+[their homepage](http://indexden.com/). You will be given a private URL that
+accesses your IndexDen account. This URL is visible on your dashboard (you
+might need to scroll down).
+
+Set the `INDEXDEN_PRIVATE_URL` environment variable to your private URL.
  
-When completed, you'll have a file called `client_secrets.json` and a p12 file
-which is the key to access the service account. Both are needed by the software.
+Set the `INDEXDEN_INDEX` environment variable to the name of the index you want
+to use for KarmaWorld. The index will be created automatically when KarmaNotes
+is run if it doesn't already exist. It may be created through the GUI if
+desired.
  
  ### Twitter
  
@@ -104,225 +346,148 @@ will be required for this task.
  If this Twitter feature is desired, the consumer key and secret as well as the
  access token key and secret are needed by the software.
  
-If the required files are not found, then no errors will occur.
+If the required environment variables are not found, then no errors will occur
+and no tweets will be posted.
  
-# Development Install
+To set this up,
+[create a new Twitter application](https://dev.twitter.com/apps/new).
+Use your Heroku App URL for the website field. Leave the Callback field blank.
  
-If you need to setup the project for development, it is highly recommend that
-you grab create a development virtual machine or (if available) grab one that
-has already been created for your site.
+Make sure this application has read/write access. Generate an access token. Go
+to your OAuth settings, and grab the "Consumer key", "Consumer secret",
+"Access token", and "Access token secret". Paste these, respectively, into the
+environment variables `TWITTER_CONSUMER_KEY`, `TWITTER_CONSUMER_SECRET`,
+`TWITTER_ACCESS_TOKEN_KEY`, `TWITTER_ACCESS_TOKEN_SECRET`.
  
-The *host machine* is the system which runs e.g. VirtualBox, while the
-*virtual machine* refers to the system running inside e.g. VirtualBox. 
-
-## Creating a Virtual Machine by hand
+# Local
  
-Create a virtual machine with your favorite VM software. Configure the virtual
-machine for production with the steps shown in the [Production Install](#production-install) section.
+## Configuring foreman
  
-## Creating a Virtual Machine with Vagrant
+KarmaNotes runs on Heroku as a webapp and thus makes use of a Procfile. While
+not strictly necessary, KarmaWorld can use the same basic Procfile which is
+convenient and consistent.
  
-Vagrant supports a variety of virtual machine software and there is additional
-support for Vagrant to deploy to a wider variety. However, for these
-instructions, it is assumed Vagrant will be deployed to VirtualBox.
+To use the Procfile locally, we recommend using `foreman`. To install `foreman`
+and other Heroku tools, install the
+[Heroku toolbelt](https://toolbelt.heroku.com/).
  
-1. Configure external dependencies on the host machine:
-   * Under `{project_root}/karmaworld/secret/`:
-        1. Copy files with the example extension to the corresponding filename
-          without the example extension (e.g.
-          `cp filepicker.py.example filepicker.py`)
-        1. Modify those files, but ignore `db_settings.py` (Vagrant takes care of that one)
-        1. Copy the Google Drive service account p12 file to `drive.p12`
-           (this filename and location may be changed in `drive.py`)
-        1. Ensure `*.py` in `secret/` are never added to the git repo.
-           (.gitignore should help warn against taking this action)
-
-1. Install [VirtualBox](http://www.virtualbox.com/)
-
-1. Install [vagrant](http://www.vagrantup.com/) 1.3 or higher
-
-1. Use Vagrant to create the virtual machine.
-   * While in `cd {project_root}`, type `vagrant up`
-
-1. Connect to the virtual machine with `vagrant ssh`
-
-Note:
-Port 80 of the virtual machine will be configured as port 6659 on the host
-system. While on the host system, fire up your favorite browser and point it at
-`http://localhost:6659/`. This connects to your host system on port 6659, which
-forwards to your virtual machine's web site.
-
-## Completing the Virtual Machine with Fabric
-
-*Notice* Fabric might not run properly if you presently in a virtualenv.
-`deactivate` prior to running fab commands.
-
-1. On the virtual machine, type `cd karmanotes` to get into the code repository.
-
-1. In the code repo of the VM, type `fab -H 127.0.0.1 first_deploy`
-
-   During this process, you will be queried to create a Django site admin.
-   Provide information. You will be asked to remove duplicate schools. Respond
-   with yes.
+Ensure environment variables are available to `foreman` by copying
+`.env.example` to `.env` and update those variables as appropriate for your
+local system.
  
-# Production Install
+## pdf2htmlEX
  
-These steps are taken care of by automatic utilities. Vagrant performs the
-first subsection of these instructions and Fabric performs the second
-subsection. These instructions are detailed here for good measure, but should
-not generally be needed.
+This project uses [pdf2htmlEX](https://github.com/coolwanglu/pdf2htmlEX) as
+a dependency. pdf2htmlEX is used to convert uploaded PDF notes into HTML. It
+needs to be installed on the same system that KarmaWorld is running on.
  
-1. Ensure the following are installed:
-   * `git`
-   * `7zip` (for unzipping US Department of Education files)
-   * `PostgreSQL` (server and client)
-   * `nginx`
-   * `libxslt` and `libxml2` (used by some Python libraries)
-   * `RabbitMQ` (server)
-   * `memcached`
-   * `Python`
-   * `PIP`
-   * `virtualenv`
-   * `virtualenvwrapper` (might not be needed anymore)
-   * `pdf2htmlEX`
+### using their source
  
-   On a Debian system supporting Apt, this can be done with:
+See their instructions at
+[https://github.com/coolwanglu/pdf2htmlEX/wiki/Building](https://github.com/coolwanglu/pdf2htmlEX/wiki/Building).
  
-        sudo apt-get install python-pip postgresql python-virtualenv nginx \
-                             virtualenvwrapper git libxml2-dev p7zip-full \
-                             postgresql-server-dev-9.1 libxslt1-dev \
-                             libmemcached-dev python-dev rabbitmq-server
+Make sure to [patch](https://github.com/FinalsClub/pdf2htmlEX/commit/3c19f6abd8d59d1b58cf254b7160b332b3f5b517)
+the source code to expose two variables.
  
-        sudo add-apt-repository ppa:coolwanglu/pdf2htmlex
-        sudo apt-get install pdf2htmlex
+### using our fork
  
-1. Generate a PostgreSQL database and a role with read/write permissions.
-   * For Debian, these instructions are helpful: https://wiki.debian.org/PostgreSql
+You can use FinalsClub's [outdated version of pdf2htmlEX](https://github.com/FinalsClub/pdf2htmlEX).
+See their installation instructions above, but don't worry about patching.
  
-1. Modify configuration files.
-   * There are settings in `{project_root}/karmaworld/settings/prod.py`
-       * Most of the setting should work fine by default.
-   * There are additional configuration options for external dependencies
-     under `{project_root}/karmaworld/secret/`.
-        1. Copy files with the example extension to the corresponding filename
-          without the example extension (e.g.
-          `cp filepicker.py.example filepicker.py`)
-        1. Modify those files.
-           * Ensure `PROD_DB_USERNAME`, `PROD_DB_PASSWORD`, and `PROD_DB_NAME`
-             inside `db_settings.py` match the role, password, and database
-             generated in the previous step.
-        1. Copy the Google Drive service account p12 file to `drive.p12`
-           (this filename and location may be changed in `drive.py`)
-        1. Ensure `*.py` in `secret/` are never added to the git repo.
-           (.gitignore should help warn against taking this action)
-
-1. Make sure that /var/www exists, is owned by the www-data group, and that
-   the desired user is a member of the www-data group.
-
-1. Configure nginx with a `proxy_pass` to port 8000 (or whatever port gunicorn
-   will be running the site on) and any virtual hosting that is desired.
-   Here is an example server file to put into `/etc/nginx/sites-available/`
-
-        server {
-            listen 80;
-            # don't do virtual hosting, handle all requests regardless of header
-            server_name "";
-            client_max_body_size 20M;
-        
-            location / {
-                # pass traffic through to gunicorn
-                proxy_pass http://127.0.0.1:8000;
-            }
-        }
-
-1. Configure the system to start supervisor on boot. An init script for
-   supervisor is in the repo at `{project_root}/karmaworld/confs/supervisor`.
-   `update-rc.d supervisor defaults` is the Debian command to load the init
-   script into the correct directories.
-
-1. Make sure `{project_root)/var/log` and `{project_root}/var/run` exist and
-   may be written to, or else put the desired logging and run file paths into
-   `{project_root}/confs/prod/supervisord.conf`
-
-1. Create a virtualenv under `/var/www/karmaworld/venv`
-
-1. Change into the virtualenv with `. /var/www/karmaworld/venv/bin/activate`.
-   Within the virtualenv:
-
-    1. Update the Python depenencies with `pip -i {project_root}/reqs/prod.txt`
-    
-    1. Setup the database with `python {project_root}/manage.py syncdb --migrate`
-
-    1. Collect static resources and put them in the static hosting location with
-       `python {project_root}/manage.py collect_static`
-
-1. The database needs to be populated with schools. A list of accredited schools
-   may be found on the US Department of Education website:
-   http://ope.ed.gov/accreditation/GetDownloadFile.aspx
-
-   Alternatively, use the built-in scripts while in the virtualenv:
+### using their PPA
  
-   1. Fetch USDE schools with
-      `python {project_root}/manage.py fetch_usde_csv ./schools.csv`
+You can use [their upstream PPA](https://launchpad.net/~coolwanglu/+archive/ubuntu/pdf2htmlex).
  
-   1. Upload the schools into the database with
-      `python {project_root}/manage.py import_usde _csv ./schools.csv`
+        apt-add-repository ppa:coolwanglu/pdf2htmlex
+        apt-get update
+        apt-get install pdf2htmlex
  
-   1. Clean up redundant information with
-      `python {project_root}/manage.py sanitize_usde_schools`
+Then patch the javascript on your system by running this code in the shell.
  
-1. Startup `supervisor`, which will run `celery` and `gunicorn`. This may be
-   done from within the virtualenv by typing
-   `python {project_root}/manage.py start_supervisord`
+        cat >> `dpkg -L pdf2htmlex | grep pdf2htmlEX.js` <<PDF2HTMLEXHACK
+        Viewer.prototype['rescale'] = Viewer.prototype.rescale;
+        Viewer.prototype['scroll_to'] = Viewer.prototype.scroll_to;
+        PDF2HTMLEXHACK
  
-1. If everything went well, gunicorn should be running the website on port 8000
-   and nginx should be serving gunicorn on port 80.
+## Install
  
-# Accessing the Vagrant Virtual Machine
+  1. `virtualenv venv`
+  1. `source venv/bin/activate`
+  1. `pip install -r requirements.txt`
+    * on Debian systems, some packages are required for pip to succeed:
+    * `apt-get install python-dev libpython-dev python-psycopg2 libmemcached-dev libffi-dev libssl-dev postgresql-server-dev-X.Y`libxml2-dev libxslt-dev
+  1. `pip install -r requirements-dev.txt`
  
-## Connecting to the VM via SSH
-If you have installed a virtual machine using `vagrant up`, you can connect
-to it by running `vagrant ssh` from `{project_root}`.
+## Configuration
  
-## Connecting to the development website on the VM
-To access the website running on the VM, point your browser at
-http://localhost:6659/ using your host computer.
+Make sure [External Service Dependencies](#external-service-dependencies) are
+satisfied. This includes running a local database and RabbitMQ instance as
+desired.
  
-Port 6659 on your local machine is set to forward to the VM's port 80.
+  1. configure `.env` as per [instructions](#external-service-dependencies)
+  1. `foreman run python manage.py syncdb --migrate --noinput`
+  1. `foreman run python manage.py createsuperuser`
+  1. `foreman run python manage.py fetch_usde_csv ./schools.csv`
+  1. `foreman run python manage.py import_usde_csv ./schools.csv`
+  1. `foreman run python manage.py sanitize_usde_schools`
  
-Fun fact: 6659 was chosen because of OM (sanskrit) and KW (KarmaWorld) on a
-phone: 66 59.
+* `fetch_usde_csv` downloads school records and stores them to `./schools.csv`. This file name
+     and location is arbitrary. As long as the same file is passed into `import_usde_csv` for
+     reading, everything should be fine.
  
-## Updating the VM code repository
-Once connected to the virtual machine by SSH, you will see `karmaworld` in
-the home directory. That is the `{project_root}` in the virtual machine.
+* `fetching_usde_csv` requires `7zip` to be installed for processing compressed
+     archives. On Debian-based systems, this entails `apt-get install p7zip-full`
  
-`cd karmaworld` and then use `git fetch; git merge` and/or `git pull origin` as
-desired.
+If using `DJANGO_SETTINGS_MODULE='karmaworld.settings.dev'` in `.env`, static
+file hosting should be done by local files.  `DEFAULT_FILE_STORAGE` should be
+set to `django.core.files.storage.FileSystemStorage`.
+
+If using `DJANGO_SETTINGS_MODULE='karmaworld.settings.prod'` in `.env`, static
+file hosting is done by `DEFAULT_FILE_STORAGE` defined in `.env`.
+
+## Run
+
+Make sure you are inside your virtual environment (`source venv/bin/activate`).
  
-The virtual machine's code repository is set to use your host machine's
-local repository as the origin. So if you make changes locally and commit them,
-without pushing them anywhere, your VM can pull those changes in for testing.
+If the code has changed or this is the first run, make sure any modified static
+files get compressed with `foreman run python manage.py compress`. Static files
+then need to be uploaded correctly with `foreman run python manage.py
+collectstatic`.
  
-This may seem like duplication. It is. The duplication allows your host machine
-to maintain git credentials and manage repository access control so that your
-virtual machine doesn't need sensitive information. Your virtual machine simply
-pulls from the local repository on your local file system without needing
-credentials, etc.
+Run `foreman start`.  `foreman` will load the `.env` file and manage running all
+processes in a way that is similar to that of Heroku. This allows better
+consistency with local, staging, and production deployments.
  
-## Deleting the Virtual Machine
-If you want to start a fresh virtual machine or simply remove the virtual
-machine from your hard drive, Vagrant has a command for that. While in 
-`{project_root}` of the host system, type `vagrant destroy` and confirm with
-`y`. This will remove the VM from your hard drive.
+To run web-only, but no celery or beat, run `foreman start web` to specify
+strictly the web worker.
  
-If you wanted a fresh VM, the next step is to run `vagrant up`, which will
-start a brand new VM (since the old one is gone).
+Press ctrl-C to kill foreman. Foreman will run Django's runserver command.
+If you wish to have more control over how this is done, you can do
+`foreman run python manage.py runserver <options>`. For running any other
+`manage.py` commands, you should also precede them with `foreman run` like just shown.
+This simply ensures that the environment variables from `.env` are present.
+
+# Heroku Install
+
+KarmaNotes runs on Heroku as a webapp. This section addresses what was done
+for KarmaNotes so that other implementations of KarmaWorld can be run on
+Heroku.
+
+Before anything else, download the [Heroku toolbelt](https://toolbelt.heroku.com/).
+
+To run KarmaWorld on Heroku, do `heroku create` and `git push heroku master` as typical
+for a Heroku application. Set your the variable `BUILDPACK_URL` to
+`https://github.com/FinalsClub/heroku-buildpack-karmanotes` to use a buildpack
+designed to support KarmaNotes.
+
+You will need to import the US Department of Education's list of accredited schools.
+   1. Fetch USDE schools with
+      `heroku run python manage.py fetch_usde_csv ./schools.csv`
+   1. Upload the schools into the database with
+      `heroku run python /manage.py import_usde_csv ./schools.csv`
+   1. Clean up redundant information with
+      `heroku run python /manage.py sanitize_usde_schools`
  
-## Other Vagrant commands
-Please see [vagrant documentation](http://docs.vagrantup.com/v2/cli/index.html)
-for more information on how to use the vagrant CLI to manage your development
-VM.
  
  # Django Database management
  
@@ -331,9 +496,9 @@ VM.
  We have setup Django to use
  [south](http://south.aeracode.org/wiki/QuickStartGuide) for migrations. When
  changing models, it is important to run
-`python {project_root}/manage.py schemamigration` which will create a migration
+`foreman run python manage.py schemamigration` which will create a migration
   to reflect the model changes into the database. These changes can be pulled
-into the database with `python {project_root}/manage.py migrate`.
+into the database with `foreman run python manage.py migrate`.
  
  Sometimes the database already has a migration performed on it, but that
  information wasn't told to south. There are subtleties to the process which
@@ -345,14 +510,13 @@ flag.
  A number of assets have been added to the repository which come from external
  sources. It would be difficult to keep a complete list in this README and keep
  it up to date. Software which originally came from outside parties can
-generally be found in `{project_root}/karmaworld/assets`.
+generally be found in `karmaworld/assets`.
  
  Additionally, all third party Python projects (downloaded and installed with
  pip) are listed in these files:
  
-* `{project_root}/reqs/common.txt`
-* `{project_root}/reqs/dev.txt`
-* `{project_root}/reqs/prod.txt`
+* `requirements.txt`
+* `requirements-dev.txt`
  
  # Thanks