closes #376 adjust min length before autocompleting schools

[oweals/karmaworld.git] / README.md
diff --git a/README.md b/README.md

index c13c9cc7ab9e0056350cd8098cc1e6087582f31e..b06d85b3baea2136745765bfc01b5fa82b391e29 100644 (file)
--- a/README.md
+++ b/README.md
@@ -9,9 +9,6 @@ __Contact__: info@karmanotes.org
  
  v3.0 of the karmanotes.org website from the FinalsClub Foundation
  
-
-
-
  # Purpose
  
  KarmaNotes is an online database of college lecture notes.  KarmaNotes empowers college students to participate in the free exchange of knowledge. 
@@ -45,59 +42,196 @@ directory underneath that (`{project_root}/karmaworld`) alongside files like
  
  Notice: This software makes use of external third party services which require
  accounts to access the service APIs. Without these third parties available,
-this software may require considerable overhaul.
+this software may require considerable overhaul. These services have
+API keys, credentials, and other information that you must provide to KarmaNotes
+as environment variables. The best way to persist these environment variables is
+by using a `.env` file. Copy `.env.example` to `.env` and populate the fields as
+required.
+
+### Heroku
+This project has chosen to use [Heroku](www.heroku.com) to host the Django and
+celery software. While not a hard requirement, the more up-to-date parts of this
+documentation will operate assuming Heroku is in use.
+
+See README.heroku for more information.
+
+### Celery Queue
+Celery uses the Apache Message Queueing Protocol for passing messages to its workers.
+We recommend using Heroku's CloudAMQP add-on, getting your own CloudAMQP account, or
+running a queueing system on your own. The `CLOUDAMQP_URL` environment variable must be set correctly
+for KarmaNotes to be able to use Celery. The `CELERY_QUEUE_NAME` environment variable
+must be set to the name of the queue you wish to use. Settings this to something unique
+allows multiple instances of KarmaNotes (or some other software) to share the same queueing server.
  
-### Filepicker
-This software uses [Filepicker.io](https://www.inkfilepicker.com/) for uploading
-files. This requires an account with Filepicker.
+### Amazon S3
+The instructions for creating an [S3](http://aws.amazon.com/s3/) bucket may be
+[found on Amazon.](http://docs.aws.amazon.com/AmazonS3/latest/gsg/CreatingABucket.html)
  
-Filepicker requires an additional third party file hosting site where it may
-send uploaded files. For this project, we have used Amazon S3.
+Two, separate buckets will be needed in production: one for static file hosting
+and one as a communication bus with Filepicker.
  
-Filepicker will provide an API key. This is needed by the software.
+This software uses S3 to store files which are sent to or received 
+from Filepicker. Filepicker will need to know the S3 bucket name, access key,
+and secret key.
  
-### Amazon S3
+Filepicker users can only make use of an S3 bucket with a paid account. For
+development purposes, no Filepicker S3 bucket is needed. Skip all references to
+the Filepicker S3 bucket in the development case.
+
+The software will not need to know the S3 credentials for the Filepicker
+bucket, because the software will upload files to the Filepicker S3 bucket
+through Filepicker's API and it will link to or download files from the
+Filepicker S3 bucket through Filepicker's URLs. This will be covered in the
+Filepicker section below.
+
+This software uses S3 for hosting static files. The software will need to
+update static files on the S3 bucket. As such, the software will need the
+S3 bucket name, access key, and secret key via the environment variables. This
+is described in subsections below.
+
+To support static hosting, `DEFAULT_FILE_STORAGE` should be set to
+`'storages.backends.s3boto.S3BotoStorage'`, unless there is a compelling reason
+to change it.
+
+There are three ways to setup access to the S3 buckets depending upon speed
+and security. The more secure, the slower it will be to setup.
+
+#### insecure S3 access
+For quick and dirty insecure S3 access, create a single group and a single user
+with full access to all buckets. Full access to all buckets is insecure!
+
+Create an 
+[Amazon IAM group](http://docs.aws.amazon.com/IAM/latest/UserGuide/Using_CreatingAndListingGroups.html)
+with full access to the S3 bucket. Select the "Amazon S3 Full Accesss" Policy
+Template.
+
+Create an
+[Amazon IAM user](http://docs.aws.amazon.com/IAM/latest/UserGuide/Using_SettingUpUser.html).
+Copy the credentials into the `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY`
+environment variables. Be sure to write down the access information, as it
+will only be shown once.
+
+#### secure S3 access
+For secure S3 access, two users will be needed. One with access to the
+Filepicker bucket and one with access to the static hosting bucket.
+
+Note: this might need to be modified to prevent creation and deletion of
+buckets?
+
+Create an 
+[Amazon IAM group](http://docs.aws.amazon.com/IAM/latest/UserGuide/Using_CreatingAndListingGroups.html)
+with full access to the S3 bucket. The quick way is to select the
+"Amazon S3 Full Accesss" Policy Template and replace `"Resource": "*"` with 
+`"Resource": "arn:aws:s3:::<static_bucket_name>"`.
+
+Create an
+[Amazon IAM user](http://docs.aws.amazon.com/IAM/latest/UserGuide/Using_SettingUpUser.html).
+Copy the credentials into the `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY`
+environment variables. Be sure to write down the access information, as it
+will only be shown once.
+
+Ensure the created user is a member of the group with access to the S3
+static files bucket.
+
+Repeat the process again, creating a group for the Filepicker bucket and
+creating a user with access to that group. These credentials will be passed
+on to Filepicker.
+
+#### somewhat secure S3 access
+Create two groups as described in the `secure S3 access` section above.
+
+Create a single user, save the credentials as described in the
+`insecure S3 access` section above, and pass the credentials on to Filepicker.
+
+Add the single user to both groups.
+
+This is less secure because if your web server or Filepicker get compromised
+(so there are two points for potential failure), the single compromised
+user has full access to both buckets.
  
-#### for Filepicker
-This software uses [Amazon S3](http://aws.amazon.com/s3/) as a third party file
-hosting site. The primary use case is a destination for Filepicker files. The
-software won't directly need any S3 information for this use case; it will be
-provided directly to Filepicker.
+### Amazon Cloudfront CDN
+[Cloudfront CDN](http://aws.amazon.com/cloudfront/) assists static file hosting.
  
-#### for Static File hosting
-A secondary use case for S3 is hosting static files. The software will need to
-update static files on the S3 bucket. In this case, the software will need the
-S3 bucket name, access key, and secret key.
+Follow
+[Amazon's instructions](http://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/GettingStarted.html)
+to host static files out of the appropriate S3 bucket. Note that Django's static
+file upload process has been modified to mark static files as publicly
+assessible.
  
-The code assumes S3 is used for static files in a production environment. To
-obviate the need for hosting static files through S3 (noting that it still might
-be necessary for Filepicker), a workaround was explained [in this Github ticket](https://github.com/FinalsClub/karmaworld/issues/192#issuecomment-30193617).
+In the settings for the Cloudfront Distribution, copy the "Domain Name" from
+General settings and set `CLOUDFRONT_DOMAIN` to it. For example, `abcdefghij.cloudfront.net`.
  
-That workaround is repeated here. Make the following changes to
-`{project_root}/karmaworld/settings/prod.py`:
+### Amazon Mechanical Turk
+Mechanical turk is employed to generate human feedback from uploaded notes.
+This service is helpful for generating flash cards and quizzes.
  
-1. comment out everything about static_s3 from imports
-2. comment out storages from the `INSTALLED_APPS`
-3. change `STATIC_URL` to `'/assets/'`
-4. comment out the entire storages section (save for part of `INSTALLED_APPS` and `STATIC_URL`)
-5. add this to the nginx config:
+This service is optional and it might cause unexpected charges when
+deployed.  If the required environment variable is not found,
+then no errors will occur and no mechanical turk tasks will be created, avoiding any unexpected
+costs.
  
-    location /assets/ {
-        root /var/www/karmaworld/karmaworld/;
-    }
+The `MTURK_HOST` environment variable is almost certainly
+`"mechanicalturk.amazonaws.com"`.
+
+The code will create and publish HITs on your behalf.
  
  ### Google Drive
  This software uses [Google Drive](https://developers.google.com/drive/) to
  convert documents to and from various file formats.
  
-A Google Drive service account with access to the Google Drive is required. Thismay be done with a Google Apps account with administrative privileges, or ask
+A Google Drive service account with access to the Google Drive is required.
+This may be done with a Google Apps account with administrative privileges, or ask
  your business sysadmin.
  
-These are the instructions to create a Google Drive service account:
-https://developers.google.com/drive/delegation
+Follow [Google's instructions](https://developers.google.com/drive/delegation)
+to create a Google Drive service account.
+
+Convert the p12 file into a Base64 encoded string for the
+`GOOGLE_SERVICE_KEY_BASE64` environment variable. There are many ways to do
+this. If Python is available, the
+[binascii library](https://docs.python.org/2/library/binascii.html#binascii.b2a_base64)
+makes this very easy:
+
+        import binascii
+        with open('file.p12', 'r') as f:
+            print binascii.b2a_base64(f.read)
+
+Copy the contents of `client_secret_*.apps.googleusercontent.com.json` into the
+`GOOGLE_CLIENT_SECRETS` environment variable.
+
+### Filepicker
+This software uses [Filepicker.io](https://www.inkfilepicker.com/) for uploading
+files. This requires an account with Filepicker.
+
+Filepicker can use an additional third party file hosting site where it may
+send uploaded files. This project, in production, uses Amazon S3 as the third
+party. See the Amazon S3 section above for more information.  
+
+Create a new App with Web SDK and provide the Heroku App URL for the
+Application's URL. You'll be given an API Key for the App. Paste this into the
+`FILEPICKER_API_KEY` environment variable.
+
+Find the 'App Security' button on the left hand side of the web site. Make sure
+'Use Security' is enabled. Generate a new secret key. Paste this key into the
+`FILEPICKER_SECRET` environment variable.
+
+If you have an upgraded plan, you can configure Filepicker to have access to
+your Filepicker S3 bucket. Click 'Amazon S3' on the left hand side menu and
+supply the credentials for the user with access to the Filepicker S3 bucket.
+
+### IndexDen
+KarmaNotes uses IndexDen to create a searchable index of all the notes in the
+system. Create an free IndexDen account at
+[their homepage](http://indexden.com/). You will be given a private URL that
+accesses your IndexDen account. This URL is visible on your dashboard (you
+might need to scroll down).
  
-When completed, you'll have a file called `client_secrets.json` and a p12 file
-which is the key to access the service account. Both are needed by the software.
+Set the `INDEXDEN_PRIVATE_URL` environment variable to your private URL.
+
+Set the `INDEXDEN_INDEX` environment variable to the name of the index you want
+to use for KarmaNotes. The index will be created automatically when KarmaNotes
+is run if it doesn't already exist. It may be created through the GUI if
+desired.
  
  ### Twitter
  
@@ -107,209 +241,71 @@ will be required for this task.
  If this Twitter feature is desired, the consumer key and secret as well as the
  access token key and secret are needed by the software.
  
-If the required files are not found, then no errors will occur.
-
-# Development Install
-
-If you need to setup the project for development, it is highly recommend that
-you grab create a development virtual machine or (if available) grab one that
-has already been created for your site.
+If the required environment variables are not found, then no errors will occur
+and no tweets will be posted.
  
-The *host machine* is the system which runs e.g. VirtualBox, while the
-*virtual machine* refers to the system running inside e.g. VirtualBox. 
-
-## Creating a Virtual Machine by hand
-
-Create a virtual machine with your favorite VM software. Configure the virtual
-machine for production with the steps shown in the [Production Install](#production-install) section.
-
-## Creating a Virtual Machine with Vagrant
-
-Vagrant supports a variety of virtual machine software and there is additional
-support for Vagrant to deploy to a wider variety. However, for these
-instructions, it is assumed Vagrant will be deployed to VirtualBox.
-
-1. Configure external dependencies on the host machine:
-   * Under `{project_root}/karmaworld/secret/`:
-        1. Copy files with the example extension to the corresponding filename
-          without the example extension (e.g.
-          `cp filepicker.py.example filepicker.py`)
-        1. Modify those files, but ignore `db_settings.py` (Vagrant takes care of that one)
-        1. Copy the Google Drive service account p12 file to `drive.p12`
-           (this filename and location may be changed in `drive.py`)
-        1. Ensure `*.py` in `secret/` are never added to the git repo.
-           (.gitignore should help warn against taking this action)
-
-1. Install [VirtualBox](http://www.virtualbox.com/)
-
-1. Install [vagrant](http://www.vagrantup.com/) 1.3 or higher
-
-1. Use Vagrant to create the virtual machine.
-   * While in `cd {project_root}`, type `vagrant up`
-
-1. Connect to the virtual machine with `vagrant ssh`
-
-Note:
-Port 80 of the virtual machine will be configured as port 6659 on the host
-system. While on the host system, fire up your favorite browser and point it at
-`http://localhost:6659/`. This connects to your host system on port 6659, which
-forwards to your virtual machine's web site.
-
-## Completing the Virtual Machine with Fabric
-
-1. On the virtual machine, type `cd karmanotes` to get into the code repository.
-
-1. In the code repo of the VM, type `fab -H 127.0.0.1 first_deploy`
-
-   During this process, you will be queried to create a Django site admin.
-   Provide information. You will be asked to remove duplicate schools. Respond
-   with yes.
-
-# Production Install
-
-These steps are taken care of by automatic utilities. Vagrant performs the
-first subsection of these instructions and Fabric performs the second
-subsection. These instructions are detailed here for good measure, but should
-not generally be needed.
-
-1. Ensure the following are installed:
-   * `git`
-   * `7zip` (for unzipping US Department of Education files)
-   * `PostgreSQL` (server and client)
-   * `nginx`
-   * `libxslt` and `libxml2` (used by some Python libraries)
-   * `RabbitMQ` (server)
-   * `memcached`
-   * `Python`
-   * `PIP`
-   * `virtualenv`
-   * `virtualenvwrapper` (might not be needed anymore)
-
-   On a Debian system supporting Apt, this can be done with:
-
-       sudo apt-get install python-pip postgresql python-virtualenv \
-                            virtualenvwrapper git nginx p7zip-full \
-                            postgresql-server-dev-9.1 libxslt1-dev libxml2-dev \
-                            libmemcached-dev python-dev rabbitmq-server
-
-1. Generate a PostgreSQL database and a role with read/write permissions.
-   * For Debian, these instructions are helpful: https://wiki.debian.org/PostgreSql
-
-1. Modify configuration files.
-   * There are settings in `{project_root}/karmaworld/settings/prod.py`
-       * Most of the setting should work fine by default.
-   * There are additional configuration options for external dependencies
-     under `{project_root}/karmaworld/secret/`.
-        1. Copy files with the example extension to the corresponding filename
-          without the example extension (e.g.
-          `cp filepicker.py.example filepicker.py`)
-        1. Modify those files.
-           * Ensure `PROD_DB_USERNAME`, `PROD_DB_PASSWORD`, and `PROD_DB_NAME`
-             inside `db_settings.py` match the role, password, and database
-             generated in the previous step.
-        1. Copy the Google Drive service account p12 file to `drive.p12`
-           (this filename and location may be changed in `drive.py`)
-        1. Ensure `*.py` in `secret/` are never added to the git repo.
-           (.gitignore should help warn against taking this action)
-
-1. Make sure that /var/www exists, is owned by the www-data group, and that
-   the desired user is a member of the www-data group.
-
-1. Configure nginx with a `proxy_pass` to port 8000 (or whatever port gunicorn
-   will be running the site on) and any virtual hosting that is desired.
-   Here is an example server file to put into `/etc/nginx/sites-available/`
-
-        server {
-            listen 80;
-            # don't do virtual hosting, handle all requests regardless of header
-            server_name "";
-            client_max_body_size 20M;
-        
-            location / {
-                # pass traffic through to gunicorn
-                proxy_pass http://127.0.0.1:8000;
-            }
-        }
-
-1. Configure the system to start supervisor on boot. An init script for
-   supervisor is in the repo at `{project_root}/karmaworld/confs/supervisor`.
-   `update-rc.d supervisor defaults` is the Debian command to load the init
-   script into the correct directories.
-
-1. Make sure `{project_root)/var/log` and `{project_root}/var/run` exist and
-   may be written to, or else put the desired logging and run file paths into
-   `{project_root}/confs/prod/supervisord.conf`
-
-1. Create a virtualenv under `/var/www/karmaworld/venv`
-
-1. Change into the virtualenv with `. /var/www/karmaworld/venv/bin/activate`.
-   Within the virtualenv:
-
-    1. Update the Python depenencies with `pip -i {project_root}/reqs/prod.txt`
-    
-    1. Setup the database with `python {project_root}/manage.py syncdb --migrate`
-
-    1. Collect static resources and put them in the static hosting location with
-       `python {project_root}/manage.py collect_static`
-
-1. The database needs to be populated with schools. A list of accredited schools
-   may be found on the US Department of Education website:
-   http://ope.ed.gov/accreditation/GetDownloadFile.aspx
-
-   Alternatively, use the built-in scripts while in the virtualenv:
-
-   1. Fetch USDE schools with
-      `python {project_root}/manage.py fetch_usde_csv ./schools.csv`
+To set this up,
+[create a new Twitter application](https://dev.twitter.com/apps/new).
+Use your Heroku App URL for the website field. Leave the Callback field blank.
  
-   1. Upload the schools into the database with
-      `python {project_root}/manage.py import_usde _csv ./schools.csv`
+Make sure this application has read/write access. Generate an access token. Go
+to your OAuth settings, and grab the "Consumer key", "Consumer secret",
+"Access token", and "Access token secret". Paste these, respectively, into the
+environment variables `TWITTER_CONSUMER_KEY`, `TWITTER_CONSUMER_SECRET`,
+`TWITTER_ACCESS_TOKEN_KEY`, `TWITTER_ACCESS_TOKEN_SECRET`.
  
-   1. Clean up redundant information with
-      `python {project_root}/manage.py sanitize_usde_schools`
+### SSL Certificate
  
-1. Startup `supervisor`, which will run `celery` and `gunicorn`. This may be
-   done from within the virtualenv by typing
-   `python {project_root}/manage.py start_supervisord`
+If you wish to host your system publicly, you'll need an SSL certificate
+signed by a proper authority.
  
-1. If everything went well, gunicorn should be running the website on port 8000
-   and nginx should be serving gunicorn on port 80.
+Follow [Heroku's SSL setup](https://devcenter.heroku.com/articles/ssl-endpoint)
+to get SSL running on your server.
  
-# Accessing the Vagrant Virtual Machine
+You may set the `SSL_REDIRECT` environment variable to `true` to make KarmaNotes
+redirect insecure connections to secure ones.
  
-## Connecting to the VM via SSH
-If you have installed a virtual machine using `vagrant up`, you can connect
-to it by running `vagrant ssh` from `{project_root}`.
+# Local Install
  
-## Connecting to the development website on the VM
-To access the website running on the VM, point your browser at
-http://localhost:6659/ using your host computer.
+KarmaNotes is a Heroku application. Download the [Heroku toolbelt](https://toolbelt.heroku.com/).
  
-Port 6659 on your local machine is set to forward to the VM's port 80.
+Before your running it for the first time, there are
+a few setup steps:
+  1. `virtualenv venv`
+  1. `source venv/bin/activate`
+  1. `pip install -r requirements.txt`
+  1. `pip install -r requirements-dev.txt`
+  1. `foreman run python manage.py syncdb --migrate --noinput`
+  1. `foreman run python manage.py createsuperuser`
+  1. `foreman run python manage.py fetch_usde_csv ./schools.csv`
+  1. `foreman run python manage.py import_usde _csv ./schools.csv`
+  1. `foreman run python manage.py sanitize_usde_schools`
  
-Fun fact: 6659 was chosen because of OM (sanskrit) and KW (KarmaWorld) on a
-phone: 66 59.
+To run KarmaNotes locally, make sure you are inside your
+virtual environment (`source venv/bin/activate`) and run `foreman start`.
+Press ctrl-C to kill foreman. Foreman will run Django's runserver command.
+If you wish to have more control over how this is done, you can do
+`foreman run python manage.py runserver <options>`. For running any other
+`manage.py` commands, you should also precede them with `foreman run` like just shown.
+This simply ensures that the environment variables from `.env` are present.
  
-## Updating the VM code repository
-Once connected to the virtual machine by SSH, you will see `karmaworld` in
-the home directory. That is the `{project_root}` in the virtual machine.
+# Heroku Install
  
-`cd karmaworld` and then use `git fetch; git merge` and/or `git pull origin` as
-desired.
+KarmaNotes is a Heroku application. Download the [Heroku toolbelt](https://toolbelt.heroku.com/).
  
-The virtual machine's code repository is set to use your host machine's
-local repository as the origin. So if you make changes locally and commit them,
-without pushing them anywhere, your VM can pull those changes in for testing.
+To run KarmaNotes on Heroku, do `heroku create` and `git push heroku master` as typical
+for a Heroku application. Set your the variable `BUILDPACK_URL` to
+`https://github.com/FinalsClub/heroku-buildpack-karmanotes` to use a buildpack
+designed to support KarmaNotes.
  
-This may seem like duplication. It is. The duplication allows your host machine
-to maintain git credentials and manage repository access control so that your
-virtual machine doesn't need sensitive information. Your virtual machine simply
-pulls from the local repository on your local file system without needing
-credentials, etc.
+You will need to import the US Department of Education's list of accredited schools.
+   1. Fetch USDE schools with
+      `heroku run python manage.py fetch_usde_csv ./schools.csv`
+   1. Upload the schools into the database with
+      `heroku run python /manage.py import_usde _csv ./schools.csv`
+   1. Clean up redundant information with
+      `heroku run python /manage.py sanitize_usde_schools`
  
-## Other Vagrant commands
-Please see [vagrant documentation](http://docs.vagrantup.com/v2/cli/index.html)
-for more information on how to use the vagrant CLI to manage your development
-VM.
  
  # Django Database management
  
@@ -318,9 +314,9 @@ VM.
  We have setup Django to use
  [south](http://south.aeracode.org/wiki/QuickStartGuide) for migrations. When
  changing models, it is important to run
-`python {project_root}/manage.py schemamigration` which will create a migration
+`foreman run python manage.py schemamigration` which will create a migration
   to reflect the model changes into the database. These changes can be pulled
-into the database with `python {project_root}/manage.py migrate`.
+into the database with `foreman run python manage.py migrate`.
  
  Sometimes the database already has a migration performed on it, but that
  information wasn't told to south. There are subtleties to the process which
@@ -332,14 +328,13 @@ flag.
  A number of assets have been added to the repository which come from external
  sources. It would be difficult to keep a complete list in this README and keep
  it up to date. Software which originally came from outside parties can
-generally be found in `{project_root}/karmaworld/assets`.
+generally be found in `karmaworld/assets`.
  
  Additionally, all third party Python projects (downloaded and installed with
  pip) are listed in these files:
  
-* `{project_root}/reqs/common.txt`
-* `{project_root}/reqs/dev.txt`
-* `{project_root}/reqs/prod.txt`
+* `requirements.txt`
+* `requirements-dev.txt`
  
  # Thanks