noting current problems with Drive instructions

[oweals/karmaworld.git] / README.md
diff --git a/README.md b/README.md

index 5737481c5365e98e2459d30c1100ea201bcb44f3..187d69ac52299423150fedd03b6dd2ca89ff19b9 100644 (file)
--- a/README.md
+++ b/README.md
@@ -11,7 +11,14 @@ v3.0 of the karmanotes.org website from the FinalsClub Foundation
  
  # Purpose
  
-KarmaNotes is an online database of college lecture notes.  KarmaNotes empowers college students to participate in the free exchange of knowledge. 
+KarmaWorld is an online database of college lecture notes.  KarmaWorld
+empowers college students to participate in the free exchange of knowledge.
+
+# Naming
+
+The repository and the project are called KarmaWorld. One implementation
+of KarmaWorld, which is run by FinalsClub Foundation, is called
+[KarmaNotes](https://www.karmanotes.org/).
  
  # Pre-Installation
  
@@ -38,449 +45,435 @@ directory underneath that (`{project_root}/karmaworld`) alongside files like
  `fabfile.py` (`{project_root}/fabfile.py`) and `README.md`
  (`{project_root}/README.md`).
  
+## External Software Dependencies
+
+### pdf2htmlEX
+
+KarmaWorld uses [pdf2htmlEX](https://github.com/coolwanglu/pdf2htmlEX) as
+a dependency. pdf2htmlEX is used to convert uploaded PDF notes into HTML.
+
+An [outdated version of pdf2htmlEX](https://github.com/FinalsClub/pdf2htmlEX)
+is available which includes the
+[patch](https://github.com/FinalsClub/pdf2htmlEX/commit/3c19f6abd8d59d1b58cf254b7160b332b3f5b517)
+required for pdf2htmlEX to correctly work with KarmaWorld.
+
+Newer versions can be used by applying the patch by hand. It's a fairly
+simple two-line modification that can be done after installing
+pdf2htmlEX.
+
+### SSL Certificate
+
+If you wish to host your system publicly, you'll almost certainly want
+an SSL certificate signed by a proper authority.
+
+You may need to set the `SSL_REDIRECT` environment variable to `true` to
+make KarmaWorld redirect insecure connections to secure ones.
+
+Follow [Heroku's SSL setup](https://devcenter.heroku.com/articles/ssl-endpoint)
+to get SSL running on your server with Heroku.
+
  ## External Service Dependencies
  
  Notice: This software makes use of external third party services which require
  accounts to access the service APIs. Without these third parties available,
-this software may require considerable overhaul.
+this software may require considerable overhaul. These services have
+API keys, credentials, and other information that you must provide to KarmaWorld
+as environment variables. The best way to persist these environment variables is
+by using a `.env` file.  Copy `.env.example` to `.env` and populate the fields as required.
+
+A number of services are required even if running the KarmaWorld web service
+locally, some of the services are recommended, and some are completely optional
+even if running the web service on Heroku.
+
+Many of these services have free tiers and can be used without charge for
+development testing purposes.
+
+* Reminder
+  * Copy `.env.example` to `.env` and populate the environment variables there.
+* Required Services
+  * [Google Drive](#google-drive)
+  * [Filepicker](#filepicker)
+  * [PostgreSQL](#postgresql)
+  * [Celery](#celery-queue)
+* Optional but recommended
+  * [IndexDen](#indexden): enables searching through courses, notes, etc
+  * [Heroku](#heroku): the production environment used by karmanotes.org
+    * it might not be possible to run KarmaWorld on Heroku using a free
+      webapp.
+  * [Amazon S3](#s3-for-static-files): for static file hosting
+* Entirely optional (though used in the production environment)
+  * [Twitter](#twitter): share updates about new uploads
+  * [Amazon Mechanical Turk](#amazon-mechanical-turk): generate quizzes, flashcards, etc
+  * [Amazon CloudFront](#amazon-cloudfront-cdn)
+  * [Amazon S3](#s3-for-filepicker): store files uploaded to Filepicker
+    * Filepicker does not support S3 storage in its free tier
+
+### Heroku
+This project has chosen to use [Heroku](www.heroku.com) to host the Django and
+celery software. While not a hard requirement, the more up-to-date parts of this
+documentation will operate assuming Heroku is in use.
+
+See README.heroku for more information.
+
+#### pdf2htmlEX on Heroku
+If using Heroku, the default
+[KarmaNotes Heroku buildpack](https://github.com/FinalsClub/heroku-buildpack-karmanotes)
+will [include](https://github.com/FinalsClub/heroku-buildpack-karmanotes/blob/master/bin/steps/pdf2htmlex)
+the [required version of pdf2htmlEX](#pdf2htmlex).
+
+### Celery Queue
+Celery uses the Apache Message Queueing Protocol for passing messages to its workers.
+
+For production, we recommend using Heroku's CloudAMQP add-on, getting your own CloudAMQP account, or
+running a queueing system on your own. The `CLOUDAMQP_URL` environment variable must be set correctly
+for KarmaWorld to be able to use Celery. The `CELERY_QUEUE_NAME` environment variable
+must be set to the name of the queue you wish to use. Settings this to something unique
+allows multiple instances of KarmaWorld (or some other software) to share the same queueing server.
+
+For development on localhost, `RabbitMQ` is the default for `djcelery` and is well supported. Ensure
+`RabbitMQ` is installed for local development.
+
+### PostgreSQL
+
+PostgreSQL is not necessarily required; other RDBMS could probably be fit into
+place. However, the code was largely written assuming PostgreSQL will be used.
+Change to another system with the caveat that it might take some work.
+
+There are many cloud providers which provide PostgreSQL databases. Heroku has
+an add-on for providing a PostgreSQL database. Ensure something like this
+is made available and installed to the app.
+
+For local development, ensure a PostgreSQL is running on localhost or is
+otherwise accessible.
  
-### Filepicker
-This software uses [Filepicker.io](https://www.inkfilepicker.com/) for uploading
-files. This requires an account with Filepicker.
+### Amazon S3
+The instructions for creating an [S3](http://aws.amazon.com/s3/) bucket may be
+[found on Amazon.](http://docs.aws.amazon.com/AmazonS3/latest/gsg/CreatingABucket.html)
  
-Filepicker requires an additional third party file hosting site where it may
-send uploaded files. For this project, we have used Amazon S3.
+Two, separate buckets may be used in production: one for static file hosting
+and one as a communication bus with Filepicker.
  
-Filepicker will provide an API key. This is needed by the software.
+#### S3 for Filepicker
  
-### Amazon S3
+This software uses S3 to store files which are sent to or received 
+from Filepicker. Filepicker will need to know the S3 bucket name, access key,
+and secret key.
  
-#### for Filepicker
-This software uses [Amazon S3](http://aws.amazon.com/s3/) as a third party file
-hosting site. The primary use case is a destination for Filepicker files. The
-software won't directly need any S3 information for this use case; it will be
-provided directly to Filepicker.
-
-#### for Static File hosting
-A secondary use case for S3 is hosting static files. The software will need to
-update static files on the S3 bucket. In this case, the software will need the
-S3 bucket name, access key, and secret key.
-
-The code assumes S3 is used for static files in a production environment. To
-obviate the need for hosting static files through S3 (noting that it still might
-be necessary for Filepicker), a workaround was explained [in this Github ticket](https://github.com/FinalsClub/karmaworld/issues/192#issuecomment-30193617).
-
-That workaround is repeated here. Make the following changes to
-`{project_root}/karmaworld/settings/prod.py`:
-
-1. comment out everything about static_s3 from imports
-2. comment out storages from the `INSTALLED_APPS`
-3. change `STATIC_URL` to `'/assets/'`
-4. comment out the entire storages section (save for part of `INSTALLED_APPS` and `STATIC_URL`)
-5. add this to the nginx config:
-
-    location /assets/ {
-        root /var/www/karmaworld/karmaworld/;
-    }
-    
-### IndexDen
-KarmaNotes uses IndexDen to create a searchable index of all the notes
-in the system. Create an free IndexDen account at [their homepage](http://indexden.com/).
-You will be given a private URL that accesses your IndexDen account.
-Create a file at karmaworld/secret/indexden.py, and enter your private URL, and the name
-of the index you want KarmaNotes to use. The index will be created automatically when
-KarmaNotes is run if it doesn't already exist. For example,
-```
-PRIVATE_URL = 'http://:secretsecret@secret.api.indexden.com'
-INDEX = 'karmanotes_something_something'
-```
+Filepicker users can only make use of an S3 bucket with a paid account. For
+development purposes, no Filepicker S3 bucket is needed. Skip all references to
+the Filepicker S3 bucket in the development case.
+
+The software will not need to know the S3 credentials for the Filepicker
+bucket, because the software will upload files to the Filepicker S3 bucket
+through Filepicker's API and it will link to or download files from the
+Filepicker S3 bucket through Filepicker's URLs. This will be covered in the
+[Filepicker section](#filepicker).
+
+#### S3 for static files
+
+This software uses S3 for hosting static files. The software will need to
+update static files on the S3 bucket. As such, the software will need the
+S3 bucket name, access key, and secret key via the environment variables. This
+is described in subsections below.
+
+To support static hosting, `DEFAULT_FILE_STORAGE` should be set to
+`'storages.backends.s3boto.S3BotoStorage'`, unless there is a compelling reason
+to change it.
+
+There are three ways to setup access to the S3 buckets depending upon speed
+and security. The more secure, the slower it will be to setup.
+
+#### insecure S3 access
+For quick and dirty insecure S3 access, create a single group and a single user
+with full access to all buckets. Full access to all buckets is insecure!
+
+Create an 
+[Amazon IAM group](http://docs.aws.amazon.com/IAM/latest/UserGuide/Using_CreatingAndListingGroups.html)
+with full access to the S3 bucket. Select the "Amazon S3 Full Accesss" Policy
+Template.
+
+Create an
+[Amazon IAM user](http://docs.aws.amazon.com/IAM/latest/UserGuide/Using_SettingUpUser.html).
+Copy the credentials into the `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY`
+environment variables. Be sure to write down the access information, as it
+will only be shown once.
+
+#### secure S3 access
+For secure S3 access, two users will be needed. One with access to the
+Filepicker bucket and one with access to the static hosting bucket.
+
+Note: this might need to be modified to prevent creation and deletion of
+buckets?
+
+Create an 
+[Amazon IAM group](http://docs.aws.amazon.com/IAM/latest/UserGuide/Using_CreatingAndListingGroups.html)
+with full access to the S3 bucket. The quick way is to select the
+"Amazon S3 Full Accesss" Policy Template and replace `"Resource": "*"` with 
+`"Resource": "arn:aws:s3:::<static_bucket_name>"`.
+
+Create an
+[Amazon IAM user](http://docs.aws.amazon.com/IAM/latest/UserGuide/Using_SettingUpUser.html).
+Copy the credentials into the `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY`
+environment variables. Be sure to write down the access information, as it
+will only be shown once.
+
+Ensure the created user is a member of the group with access to the S3
+static files bucket.
+
+Repeat the process again, creating a group for the Filepicker bucket and
+creating a user with access to that group. These credentials will be passed
+on to Filepicker.
+
+#### somewhat secure S3 access
+Create two groups as described in the `secure S3 access` section above.
+
+Create a single user, save the credentials as described in the
+`insecure S3 access` section above, and pass the credentials on to Filepicker.
+
+Add the single user to both groups.
+
+This is less secure because if your web server or Filepicker get compromised
+(so there are two points for potential failure), the single compromised
+user has full access to both buckets.
+
+### Amazon Cloudfront CDN
+[Cloudfront CDN](http://aws.amazon.com/cloudfront/) assists static file hosting.
+
+Follow
+[Amazon's instructions](http://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/GettingStarted.html)
+to host static files out of the appropriate S3 bucket. Note that Django's static
+file upload process has been modified to mark static files as publicly
+assessible.
+
+In the settings for the Cloudfront Distribution, copy the "Domain Name" from
+General settings and set `CLOUDFRONT_DOMAIN` to it. For example, `abcdefghij.cloudfront.net`.
+
+### Amazon Mechanical Turk
+Mechanical turk is employed to generate human feedback from uploaded notes.
+This service is helpful for generating flash cards and quizzes.
+
+This service is optional and it might cause unexpected charges when
+deployed.  If the required environment variable is not found,
+then no errors will occur and no mechanical turk tasks will be created, avoiding any unexpected
+costs.
+
+The `MTURK_HOST` environment variable is almost certainly
+`"mechanicalturk.amazonaws.com"`.
+
+The code will create and publish HITs on your behalf.
  
  ### Google Drive
  This software uses [Google Drive](https://developers.google.com/drive/) to
  convert documents to and from various file formats.
  
-A Google Drive service account with access to the Google Drive is required. Thismay be done with a Google Apps account with administrative privileges, or ask
+A Google Drive service account with access to the Google Drive is required.
+This may be done with a Google Apps account with administrative privileges, or ask
  your business sysadmin.
  
-These are the instructions to create a Google Drive service account:
-https://developers.google.com/drive/delegation
+Follow [Google's instructions](https://developers.google.com/drive/web/auth/web-server)
+to create a Google Drive service account. If using Google Apps, it is worth
+looking at [these instructions](https://developers.google.com/drive/delegation).
  
-When completed, you'll have a file called `client_secrets.json` and a p12 file
-which is the key to access the service account. Both are needed by the software.
+Google Drive used to use p12 files by default. Now a new-style JSON file is
+downloaded by default when creating new credentials. Until the code has been
+[updated](https://github.com/FinalsClub/karmaworld/issues/437) to use the
+new-style JSON file, make sure to click the `Generate a new P12 key` button.
  
-### Twitter
-
-Twitter is used to post updates about new courses. Access to the Twitter API
-will be required for this task.
+Convert the p12 file into a Base64 encoded string for the
+`GOOGLE_SERVICE_KEY_BASE64` environment variable. There are many ways to do
+this. If Python is available, the
+[binascii library](https://docs.python.org/2/library/binascii.html#binascii.b2a_base64)
+makes this very easy:
  
-If this Twitter feature is desired, the consumer key and secret as well as the
-access token key and secret are needed by the software.
+        import binascii
+        with open('file.p12', 'r') as f:
+            print binascii.b2a_base64(f.read)
  
-If the required files are not found, then no errors will occur.
+The following instructions require creating web application credentials as
+separate. Ideally, we will
+[remove this step](https://github.com/FinalsClub/karmaworld/issues/436).
+Until then, please figure out how to create web application credentials.
  
-To set this up, create a new Twitter application at https://dev.twitter.com/apps/new.
-Make sure this application has read/write access. Generate an access token. Go to your
-OAuth settings, and grab the "Consumer key", "Consumer secret", "Access token", and
-"Access token secret".
+Copy the contents of `client_secret_*.apps.googleusercontent.com.json` into the
+`GOOGLE_CLIENT_SECRETS` environment variable.
  
-Create a file at karmaworld/secret/twitter.py, and enter these tokens. For example,
-```
-CONSUMER_KEY = '???'
-CONSUMER_SECRET = '???'
-ACCESS_TOKEN_KEY = '???'
-ACCESS_TOKEN_SECRET = '???'
-```
+### Filepicker
+This software uses [Filepicker.io](https://www.inkfilepicker.com/) for uploading
+files. This requires an account with Filepicker.
  
-### SSL Certificate
+Filepicker can use an additional third party file hosting site where it may
+send uploaded files. This project, in production, uses Amazon S3 as the third
+party. See the Amazon S3 section above for more information.  
  
-If you wish to host your system publicly, you'll need an SSL certificate
-signed by a proper authority.
+Create a new App with Web SDK and provide the Heroku App URL for the
+Application's URL. You'll be given an API Key for the App. Paste this into the
+`FILEPICKER_API_KEY` environment variable.
  
-If you are working on local system for development, a self signed certificate
-will suffice. There are plenty of resources available for learning how to
-create one, so that will not be detailed here. Note that the Vagrant file will
-automatically generated a self signed certificate within the virtual machine.
+Find the 'App Security' button on the left hand side of the web site. Make sure
+'Use Security' is enabled. Generate a new secret key. Paste this key into the
+`FILEPICKER_SECRET` environment variable.
  
-The certificate should be installed using nginx.
+If you have an upgraded plan, you can configure Filepicker to have access to
+your Filepicker S3 bucket. Click 'Amazon S3' on the left hand side menu and
+supply the credentials for the user with access to the Filepicker S3 bucket.
  
-# Development Install
+### IndexDen
+KarmaWorld uses IndexDen to create a searchable index of all the notes in the
+system. Create an free IndexDen account at
+[their homepage](http://indexden.com/). You will be given a private URL that
+accesses your IndexDen account. This URL is visible on your dashboard (you
+might need to scroll down).
  
-If you need to setup the project for development, it is highly recommend that
-you grab create a development virtual machine or (if available) grab one that
-has already been created for your site.
+Set the `INDEXDEN_PRIVATE_URL` environment variable to your private URL.
  
-The *host machine* is the system which runs e.g. VirtualBox, while the
-*virtual machine* refers to the system running inside e.g. VirtualBox. 
+Set the `INDEXDEN_INDEX` environment variable to the name of the index you want
+to use for KarmaWorld. The index will be created automatically when KarmaNotes
+is run if it doesn't already exist. It may be created through the GUI if
+desired.
  
-## Creating a Virtual Machine by hand
+### Twitter
  
-Create a virtual machine with your favorite VM software. Configure the virtual
-machine for production with the steps shown in the [Production Install](#production-install) section.
+Twitter is used to post updates about new courses. Access to the Twitter API
+will be required for this task.
  
-## Creating a Virtual Machine with Vagrant
+If this Twitter feature is desired, the consumer key and secret as well as the
+access token key and secret are needed by the software.
  
-Vagrant supports a variety of virtual machine software and there is additional
-support for Vagrant to deploy to a wider variety. However, for these
-instructions, it is assumed Vagrant will be deployed to VirtualBox.
+If the required environment variables are not found, then no errors will occur
+and no tweets will be posted.
  
-1. Configure external dependencies on the host machine:
-   * Under `{project_root}/karmaworld/secret/`:
-        1. Copy files with the example extension to the corresponding filename
-          without the example extension (e.g.
-          `cp filepicker.py.example filepicker.py`)
-        1. Modify those files, but ignore `db_settings.py` (Vagrant takes care of that one)
-        1. Copy the Google Drive service account p12 file to `drive.p12`
-           (this filename and location may be changed in `drive.py`)
-        1. Ensure `*.py` in `secret/` are never added to the git repo.
-           (.gitignore should help warn against taking this action)
+To set this up,
+[create a new Twitter application](https://dev.twitter.com/apps/new).
+Use your Heroku App URL for the website field. Leave the Callback field blank.
  
-1. Install [VirtualBox](http://www.virtualbox.org/)
+Make sure this application has read/write access. Generate an access token. Go
+to your OAuth settings, and grab the "Consumer key", "Consumer secret",
+"Access token", and "Access token secret". Paste these, respectively, into the
+environment variables `TWITTER_CONSUMER_KEY`, `TWITTER_CONSUMER_SECRET`,
+`TWITTER_ACCESS_TOKEN_KEY`, `TWITTER_ACCESS_TOKEN_SECRET`.
  
-1. Install [vagrant](http://www.vagrantup.com/) 1.3 or higher
+# Local
  
-1. Use Vagrant to create the virtual machine.
-   * While in `cd {project_root}`, type `vagrant up`
+## Configuring foreman
  
-1. Connect to the virtual machine with `vagrant ssh`
+KarmaNotes runs on Heroku as a webapp and thus makes use of a Procfie. While
+not strictly necessary, KarmaWorld can use the same basic Procfile which is
+convenient and consistent.
  
-Note:
-Port 443 of the virtual machine will be configured as port 6659 on the host
-system. While on the host system, fire up your favorite browser and point it at
-`https://localhost:6659/`. This connects to your host system on port 6659, which
-forwards to your virtual machine's web site using SSL.
+To use the Procfile locally, we recommend using `foreman`. To install `foreman`
+and other Heroku tools, install the
+[Heroku toolbelt](https://toolbelt.heroku.com/).
  
-Port 80 of the virtual machine will be configured as port 16659 on the host
-system. While on the host system, fire up your favorite browser and point it at
-`http://localhost:16659/`. This connects to your host system on port 16659,
-which forwards to your virtual machine's web site using plain text.
+Ensure environment variables are available to `foreman` by copying
+`.env.example` to `.env` and update those variables as appropriate for your
+local system.
  
-## Completing the Virtual Machine with Fabric
-
-*Notice* Fabric might not run properly if you presently in a virtualenv.
-`deactivate` prior to running fab commands.
-
-### From the Host Machine
-
-If Fabric is available on the host machine, you should be able to run Fabric
-commands directly on the host machine, pointed at the virtual machine. If
-Fabric is not available on the Host Machine, see the next section.
-
-To setup the host machine properly, see the section about
-[accessing the VM via fabric](#accessing-the-vm-via-fabric) and then return to
-this section.
-
-Assuming those steps were followed with the alias, the following instructions
-should complete the virtual machine setup:
-
-1. `cd {project_root}` on the host machine.
-
-1. type `vmfab first_deploy`.
-
-### From within the Virtual Machine
-
-If Fabric is not available on the host machine, or just for funsies, you may
-run the Fabric commands within the virtual machine.
-
-1. Connect to the virtual machine with `vagrant ssh`.
-
-1. On the virtual machine, type `cd karmanotes` to get into the code
-   repository.
-
-1. In the code repo of the VM, type `fab -H 127.0.0.1 first_deploy`
-
-   During this process, you will be queried to create a Django site admin.
-   Provide information. You will be asked to remove duplicate schools. Respond
-   with yes.
-
-# Production Install
-
-These steps are taken care of by automatic utilities. Vagrant performs the
-first subsection of these instructions and Fabric performs the second
-subsection. These instructions are detailed here for good measure, but should
-not generally be needed.
-
-1. Ensure the following are installed:
-   * `git`
-   * `7zip` (for unzipping US Department of Education files)
-   * `PostgreSQL` (server and client)
-   * `nginx`
-   * `libxslt` and `libxml2` (used by some Python libraries)
-   * `RabbitMQ` (server)
-   * `memcached`
-   * `Python`
-   * `PIP`
-   * `virtualenv`
-   * `virtualenvwrapper` (might not be needed anymore)
-   * `pdf2htmlEX`
-
-   On a Debian system supporting Apt, this can be done with:
-```
-    sudo apt-get install python-pip postgresql python-virtualenv nginx \
-    virtualenvwrapper git libxml2-dev p7zip-full \
-    postgresql-server-dev-9.1 libxslt1-dev \
-    libmemcached-dev python-dev rabbitmq-server \
-    cmake libpng-dev libjpeg-dev libgtk2.0-dev \
-    pkg-config libfontconfig1-dev autoconf libtool
-
-    wget http://poppler.freedesktop.org/poppler-0.24.4.tar.xz
-    tar xf poppler-0.24.4.tar.xz
-    cd poppler-0.24.4
-    ./configure --prefix=/usr --enable-xpdf-headers
-    make
-    sudo make install
-    cd ~/
-
-    git clone https://github.com/fontforge/fontforge.git
-    cd fontforge
-    ./bootstrap
-    ./configure --prefix=/usr
-    make
-    sudo make install
-    cd ~/
-
-    git clone https://github.com/charlesconnell/pdf2htmlEX.git
-    cd pdf2htmlEX
-    ./configure --prefix=/usr
-    cmake .
-    make
-    sudo make install
-```
-
-1. Generate a PostgreSQL database and a role with read/write permissions.
-   * For Debian, these instructions are helpful: https://wiki.debian.org/PostgreSql
-
-1. Modify configuration files.
-   * There are settings in `{project_root}/karmaworld/settings/prod.py`
-       * Most of the setting should work fine by default.
-   * There are additional configuration options for external dependencies
-     under `{project_root}/karmaworld/secret/`.
-        1. Copy files with the example extension to the corresponding filename
-          without the example extension (e.g.
-          `cp filepicker.py.example filepicker.py`)
-        1. Modify those files.
-           * Ensure `PROD_DB_USERNAME`, `PROD_DB_PASSWORD`, and `PROD_DB_NAME`
-             inside `db_settings.py` match the role, password, and database
-             generated in the previous step.
-        1. Copy the Google Drive service account p12 file to `drive.p12`
-           (this filename and location may be changed in `drive.py`)
-        1. Ensure `*.py` in `secret/` are never added to the git repo.
-           (.gitignore should help warn against taking this action)
-
-1. Make sure that /var/www exists, is owned by the www-data group, and that
-   the desired user is a member of the www-data group.
-
-1. Configure nginx with a `proxy_pass` to port 8000 (or whatever port gunicorn
-   will be running the site on) and any virtual hosting that is desired.
-   Here is an example server file to put into `/etc/nginx/sites-available/`
-
-        server {
-            listen 80;
-            server_name localhost;
-            return 301 https://$host$request_uri;
-        }
-
-        server {
-            listen 443;
-            ssl on;
-            server_name localhost;
-            client_max_body_size 20M;
-        
-            location / {
-                # pass traffic through to gunicorn
-                proxy_pass http://127.0.0.1:8000;
-                # pass HTTP(S) status through to Django
-                proxy_set_header X-Forwarded-SSL $https;
-                proxy_set_header X-Forwarded-Protocol $scheme;
-                proxy_set_header X-Forwarded-Proto $scheme;
-                # pass nginx site back to Django
-                proxy_set_header Host $http_host;
-            }
-        }
-
-1. Configure the system to start supervisor on boot. An init script for
-   supervisor is in the repo at `{project_root}/karmaworld/confs/supervisor`.
-   `update-rc.d supervisor defaults` is the Debian command to load the init
-   script into the correct directories.
-
-1. Make sure `{project_root)/var/log` and `{project_root}/var/run` exist and
-   may be written to, or else put the desired logging and run file paths into
-   `{project_root}/confs/prod/supervisord.conf`
-
-1. Create a virtualenv under `/var/www/karmaworld/venv`
-
-1. Change into the virtualenv with `. /var/www/karmaworld/venv/bin/activate`.
-   Within the virtualenv:
-
-    1. Update the Python depenencies with `pip -i {project_root}/reqs/prod.txt`
-        * If you want debugging on a production-like system:
-            1. run `pip -i {project_root}/reqs/vmdev.txt`
-            1. change `{project_root}/manage.py` to point at `vmdev.py`
-               instead of `prod.py`
-            1. ensure firefox is installed on the system (such as by
-               `sudo apt-get install firefox`)
-    
-    1. Setup the database with `python {project_root}/manage.py syncdb --migrate`
-
-    1. Collect static resources and put them in the static hosting location with
-       `python {project_root}/manage.py collect_static`
-
-1. The database needs to be populated with schools. A list of accredited schools
-   may be found on the US Department of Education website:
-   http://ope.ed.gov/accreditation/GetDownloadFile.aspx
-
-   Alternatively, use the built-in scripts while in the virtualenv:
+## pdf2htmlEX
  
-   1. Fetch USDE schools with
-      `python {project_root}/manage.py fetch_usde_csv ./schools.csv`
+This project uses [pdf2htmlEX](https://github.com/coolwanglu/pdf2htmlEX) as
+a dependency. pdf2htmlEX is used to convert uploaded PDF notes into HTML. It
+needs to be installed on the same system that KarmaWorld is running on.
  
-   1. Upload the schools into the database with
-      `python {project_root}/manage.py import_usde _csv ./schools.csv`
+### using their source
  
-   1. Clean up redundant information with
-      `python {project_root}/manage.py sanitize_usde_schools`
+See their instructions at
+[https://github.com/coolwanglu/pdf2htmlEX/wiki/Building](https://github.com/coolwanglu/pdf2htmlEX/wiki/Building).
  
-1. Startup `supervisor`, which will run `celery` and `gunicorn`. This may be
-   done from within the virtualenv by typing
-   `python {project_root}/manage.py start_supervisord`
+Make sure to [patch](https://github.com/FinalsClub/pdf2htmlEX/commit/3c19f6abd8d59d1b58cf254b7160b332b3f5b517)
+the source code to expose two variables.
  
-1. If everything went well, gunicorn should be running the website on port 8000
-   and nginx should be serving gunicorn on port 80.
+### using our fork
  
-# Update a deployed system
+You can use FinalsClub's [outdated version of pdf2htmlEX](https://github.com/FinalsClub/pdf2htmlEX).
+See their installation instructions above, but don't worry about patching.
  
-Once code has been updated, the running web service will need to be updated
-to stay in sync with the code.
+### using their PPA
  
-## Fabric
+You can use [their upstream PPA](https://launchpad.net/~coolwanglu/+archive/ubuntu/pdf2htmlex).
  
-Run the `deploy` fab command. For example:
-`fab -H 127.0.0.1 deploy`
+        apt-add-repository ppa:coolwanglu/pdf2htmlex
+        apt-get update
+        apt-get install pdf2htmlex
  
-## By Hand
+Then patch the javascript on your system by running this code in the shell.
  
-1. pull code in from the repo with `git pull`
-1. If any Python requirements have changed, install/upgrade them:
-    `pip install -r --upgrade reqs/prod.txt`
-1. If the database has changed, update the database with:
-    `python manage.py syncdb --migrate`
-1. If any static files have changed, synchornize them with;
-    `python manage.py collectstatic`
-1. Django will probably need a restart.
-    * For a dev system, ctrl-c the running process and restart it.
-    * For a production system, there are two options.
-        * `python manage.py restart_supervisord` if far reaching changes
-          have been made (that might effect celery, beat, etc)
-        * `python manage.py restart_gunicorn` if only minor Django changes
-          have been made
-        * If you are uncertain, best bet is to restart supervisord.
+        cat >> `dpkg -L pdf2htmlex | grep pdf2htmlEX.js` <<PDF2HTMLEXHACK
+        Viewer.prototype['rescale'] = Viewer.prototype.rescale;
+        Viewer.prototype['scroll_to'] = Viewer.prototype.scroll_to;
+        PDF2HTMLEXHACK
  
-# Accessing the Vagrant Virtual Machine
+## Install
  
-## Accessing the VM via Fabric
-If you have Fabric on the host machine, you can configure your host machine
-to run Fabric against the virtual machine.
+  1. `virtualenv venv`
+  1. `source venv/bin/activate`
+  1. `pip install -r requirements.txt`
+    * on Debian systems, some packages are required for pip to succeed:
+    * `apt-get install python-dev libpython-dev python-psycopg2 libmemcached-dev libffi-dev libssl-dev postgresql-server-dev-X.Y`
+  1. `pip install -r requirements-dev.txt`
  
-You will need to setup the host machine with the proper SSH credentials to
-access the virtual machine. This is done by running `vagrant ssh-config` from
-`{project_root}` and copying the results into your SSH configuration file
-(usually found at `~/.ssh/config`).
+## Configuration
  
-The VM will, by default, route its SSH connection through localhost port 2222
-on the host machine and the base user with be vagrant. Point Fabric there when
-running fab commands from `{project_root}`. So the command will look like this:
+Make sure [External Service Dependencies](#external_service_dependencies) are
+satisfied. This includes running a local database and RabbitMQ instance as
+desired.
  
-        fab -H 127.0.0.1 --port=2222 -u vagrant <commands>
+  1. configure `.env` as per [instructions](#external_service_dependencies)
+  1. `foreman run python manage.py syncdb --migrate --noinput`
+  1. `foreman run python manage.py createsuperuser`
+  1. `foreman run python manage.py fetch_usde_csv ./schools.csv`
+  1. `foreman run python manage.py import_usde_csv ./schools.csv`
+  1. `foreman run python manage.py sanitize_usde_schools`
  
-In unix, it might be convenient to create and use an alias like so:
+* `fetch_usde_csv` downloads school records and stores them to `./schools.csv`. This file name
+     and location is arbitrary. As long as the same file is passed into `import_usde_csv` for
+     reading, everything should be fine.
  
-        alias vmfab='fab -H 127.0.0.1 --port=2222 -u vagrant'
-        vmfab <commands>
+* `fetching_usde_csv` requires `7zip` to be installed for processing compressed
+     archives. On Debian-based systems, this entails `apt-get install p7zip-full`
  
-Removing a unix alias is done with `unalias`.
+## Run
  
-## Connecting to the VM via SSH
-If you have installed a virtual machine using `vagrant up`, you can connect
-to it by running `vagrant ssh` from `{project_root}`.
+Make sure you are inside your virtual environment (`source venv/bin/activate`).
  
-## Connecting to the development website on the VM
-To access the website running on the VM, point your browser at
-http://localhost:6659/ using your host computer.
+If the code has changed or this is the first run, make sure any modified static
+files get compressed with `foreman run python manage.py compress`. Static files
+then need to be uploaded correctly with `foreman run python manage.py
+collectstatic`.
  
-Port 6659 on your local machine is set to forward to the VM's port 80.
+Run `foreman start`.  `foreman` will load the `.env` file and manage running all
+processes in a way that is similar to that of Heroku. This allows better
+consistency with local, staging, and production deployments.
  
-Fun fact: 6659 was chosen because of OM (sanskrit) and KW (KarmaWorld) on a
-phone: 66 59.
+To run web-only, but no celery or beat, run `foreman start web` to specify
+strictly the web worker.
  
-## Updating the VM code repository
-Once connected to the virtual machine by SSH, you will see `karmaworld` in
-the home directory. That is the `{project_root}` in the virtual machine.
+Press ctrl-C to kill foreman. Foreman will run Django's runserver command.
+If you wish to have more control over how this is done, you can do
+`foreman run python manage.py runserver <options>`. For running any other
+`manage.py` commands, you should also precede them with `foreman run` like just shown.
+This simply ensures that the environment variables from `.env` are present.
  
-`cd karmaworld` and then use `git fetch; git merge` and/or `git pull origin` as
-desired.
+# Heroku Install
  
-The virtual machine's code repository is set to use your host machine's
-local repository as the origin. So if you make changes locally and commit them,
-without pushing them anywhere, your VM can pull those changes in for testing.
+KarmaNotes runs on Heroku as a webapp. This section addresses what was done
+for KarmaNotes so that other implementations of KarmaWorld can be run on
+Heroku.
  
-This may seem like duplication. It is. The duplication allows your host machine
-to maintain git credentials and manage repository access control so that your
-virtual machine doesn't need sensitive information. Your virtual machine simply
-pulls from the local repository on your local file system without needing
-credentials, etc.
+Before anything else, download the [Heroku toolbelt](https://toolbelt.heroku.com/).
  
-## Deleting the Virtual Machine
-If you want to start a fresh virtual machine or simply remove the virtual
-machine from your hard drive, Vagrant has a command for that. While in 
-`{project_root}` of the host system, type `vagrant destroy` and confirm with
-`y`. This will remove the VM from your hard drive.
+To run KarmaWorld on Heroku, do `heroku create` and `git push heroku master` as typical
+for a Heroku application. Set your the variable `BUILDPACK_URL` to
+`https://github.com/FinalsClub/heroku-buildpack-karmanotes` to use a buildpack
+designed to support KarmaNotes.
  
-If you wanted a fresh VM, the next step is to run `vagrant up`, which will
-start a brand new VM (since the old one is gone).
+You will need to import the US Department of Education's list of accredited schools.
+   1. Fetch USDE schools with
+      `heroku run python manage.py fetch_usde_csv ./schools.csv`
+   1. Upload the schools into the database with
+      `heroku run python /manage.py import_usde_csv ./schools.csv`
+   1. Clean up redundant information with
+      `heroku run python /manage.py sanitize_usde_schools`
  
-## Other Vagrant commands
-Please see [vagrant documentation](http://docs.vagrantup.com/v2/cli/index.html)
-for more information on how to use the vagrant CLI to manage your development
-VM.
  
  # Django Database management
  
@@ -489,9 +482,9 @@ VM.
  We have setup Django to use
  [south](http://south.aeracode.org/wiki/QuickStartGuide) for migrations. When
  changing models, it is important to run
-`python {project_root}/manage.py schemamigration` which will create a migration
+`foreman run python manage.py schemamigration` which will create a migration
   to reflect the model changes into the database. These changes can be pulled
-into the database with `python {project_root}/manage.py migrate`.
+into the database with `foreman run python manage.py migrate`.
  
  Sometimes the database already has a migration performed on it, but that
  information wasn't told to south. There are subtleties to the process which
@@ -503,15 +496,13 @@ flag.
  A number of assets have been added to the repository which come from external
  sources. It would be difficult to keep a complete list in this README and keep
  it up to date. Software which originally came from outside parties can
-generally be found in `{project_root}/karmaworld/assets`.
+generally be found in `karmaworld/assets`.
  
  Additionally, all third party Python projects (downloaded and installed with
  pip) are listed in these files:
  
-* `{project_root}/reqs/common.txt`
-* `{project_root}/reqs/dev.txt`
-* `{project_root}/reqs/prod.txt`
-* `{project_root}/reqs/vmdev.txt` (just a combo of dev.txt and prod.txt)
+* `requirements.txt`
+* `requirements-dev.txt`
  
  # Thanks