X-Git-Url: https://git.librecmc.org/?a=blobdiff_plain;f=README.md;h=70034ec16af1e5fdbe10555f906d85d6fc796600;hb=14643edabedd0f03639c72fa1c068912e6f128cd;hp=f6998bb1c36bf1af1c1c9d01808d5d53cc0cd7b8;hpb=e79f61fa27aebe5deadfbea750985ac5e04c5171;p=oweals%2Fkarmaworld.git diff --git a/README.md b/README.md index f6998bb..70034ec 100644 --- a/README.md +++ b/README.md @@ -9,15 +9,23 @@ __Contact__: info@karmanotes.org v3.0 of the karmanotes.org website from the FinalsClub Foundation +# Purpose +KarmaWorld is an online database of college lecture notes. KarmaWorld +empowers college students to participate in the free exchange of knowledge. +# Naming -# Purpose - -KarmaNotes is an online database of college lecture notes. KarmaNotes empowers college students to participate in the free exchange of knowledge. +The repository and the project are called KarmaWorld. One implementation +of KarmaWorld, which is run by FinalsClub Foundation, is called +[KarmaNotes](https://www.karmanotes.org/). # Pre-Installation +## Code + +Before doing anything, you'll need the code. Grab it from github. + Clone the project from the central repo using your github account: git clone git@github.com:FinalsClub/karmaworld.git @@ -37,89 +45,480 @@ directory underneath that (`{project_root}/karmaworld`) alongside files like `fabfile.py` (`{project_root}/fabfile.py`) and `README.md` (`{project_root}/README.md`). -# Development Install +## External Software Dependencies + +### pdf2htmlEX + +KarmaWorld uses [pdf2htmlEX](https://github.com/coolwanglu/pdf2htmlEX) as +a dependency. pdf2htmlEX is used to convert uploaded PDF notes into HTML. + +An [outdated version of pdf2htmlEX](https://github.com/FinalsClub/pdf2htmlEX) +is available which includes the +[patch](https://github.com/FinalsClub/pdf2htmlEX/commit/3c19f6abd8d59d1b58cf254b7160b332b3f5b517) +required for pdf2htmlEX to correctly work with KarmaWorld. + +Newer versions can be used by applying the patch by hand. It's a fairly +simple two-line modification that can be done after installing +pdf2htmlEX. + +### SSL Certificate + +If you wish to host your system publicly, you'll almost certainly want +an SSL certificate signed by a proper authority. + +You may need to set the `SSL_REDIRECT` environment variable to `true` to +make KarmaWorld redirect insecure connections to secure ones. + +Follow [Heroku's SSL setup](https://devcenter.heroku.com/articles/ssl-endpoint) +to get SSL running on your server with Heroku. + +## External Service Dependencies + +Notice: A number of services are required even if running the KarmaWorld web +service [locally](#local). Some of the services are recommended, and some are +completely optional even if running the web service on Heroku. + +This software makes use of external third party services which require +accounts to access the service APIs. Without these third parties available, +this software may require considerable overhaul. These services have API keys, +credentials, and other information that you must provide to KarmaWorld +as environment variables. + +The best way to persist these API keys in environment variables is by using a +`.env` file. Copy `.env.example` to `.env` and populate the fields as required. + +Many of these services have free tiers and can be used without charge for +development testing purposes. + +* Reminder + * Copy `.env.example` to `.env` and populate the environment variables there. +* Required Services + * [Google Drive](#google-drive) + * [Filepicker](#filepicker) + * [PostgreSQL](#postgresql) + * [Celery](#celery-queue) +* Optional but recommended + * [IndexDen](#indexden): enables searching through courses, notes, etc + * [Heroku](#heroku): the production environment used by karmanotes.org + * it might not be possible to run KarmaWorld on Heroku using a free + webapp. + * [Amazon S3](#s3-for-static-files): for static file hosting +* Entirely optional (though used in the production environment) + * [Twitter](#twitter): share updates about new uploads + * [Amazon Mechanical Turk](#amazon-mechanical-turk): generate quizzes, flashcards, etc + * [Amazon CloudFront](#amazon-cloudfront-cdn) + * [Amazon S3](#s3-for-filepicker): store files uploaded to Filepicker + * Filepicker does not support S3 storage in its free tier + +### Heroku +This project has chosen to use [Heroku](www.heroku.com) to host the Django and +celery software. While not a hard requirement, the more up-to-date parts of this +documentation will operate assuming Heroku is in use. + +See README.heroku for more information. + +#### pdf2htmlEX on Heroku +If using Heroku, the default +[KarmaNotes Heroku buildpack](https://github.com/FinalsClub/heroku-buildpack-karmanotes) +will [include](https://github.com/FinalsClub/heroku-buildpack-karmanotes/blob/master/bin/steps/pdf2htmlex) +the [required version of pdf2htmlEX](#pdf2htmlex). + +### Celery Queue +Celery uses the Apache Message Queueing Protocol for passing messages to its workers. + +For production, we recommend using Heroku's CloudAMQP add-on, getting your own CloudAMQP account, or +running a queueing system on your own. The `CLOUDAMQP_URL` environment variable must be set correctly +for KarmaWorld to be able to use Celery. The `CELERY_QUEUE_NAME` environment variable +must be set to the name of the queue you wish to use. Settings this to something unique +allows multiple instances of KarmaWorld (or some other software) to share the same queueing server. + +For development on localhost, `RabbitMQ` is the default for `djcelery` and is well supported. Ensure +`RabbitMQ` is installed for local development. + +### PostgreSQL + +PostgreSQL is not necessarily required; other RDBMS could probably be fit into +place. However, the code was largely written assuming PostgreSQL will be used. +Change to another system with the caveat that it might take some work. + +There are many cloud providers which provide PostgreSQL databases. Heroku has +an add-on for providing a PostgreSQL database. Ensure something like this +is made available and installed to the app. + +For local development, ensure a PostgreSQL is running on localhost or is +otherwise accessible. + +### Amazon S3 +The instructions for creating an [S3](http://aws.amazon.com/s3/) bucket may be +[found on Amazon.](http://docs.aws.amazon.com/AmazonS3/latest/gsg/CreatingABucket.html) + +Two, separate buckets may be used in production: one for static file hosting +and one as a communication bus with Filepicker. + +#### S3 for Filepicker + +This software uses S3 to store files which are sent to or received +from Filepicker. Filepicker will need to know the S3 bucket name, access key, +and secret key. + +Filepicker users can only make use of an S3 bucket with a paid account. For +development purposes, no Filepicker S3 bucket is needed. Skip all references to +the Filepicker S3 bucket in the development case. + +The software will not need to know the S3 credentials for the Filepicker +bucket, because the software will upload files to the Filepicker S3 bucket +through Filepicker's API and it will link to or download files from the +Filepicker S3 bucket through Filepicker's URLs. This will be covered in the +[Filepicker section](#filepicker). + +#### S3 for static files + +This software uses S3 for hosting static files. The software will need to +update static files on the S3 bucket. As such, the software will need the +S3 bucket name, access key, and secret key via the environment variables. This +is described in subsections below. + +To support static hosting, `DEFAULT_FILE_STORAGE` should be set to +`'storages.backends.s3boto.S3BotoStorage'`, unless there is a compelling reason +to change it. + +There are three ways to setup access to the S3 buckets depending upon speed +and security. The more secure, the slower it will be to setup. + +#### insecure S3 access +For quick and dirty insecure S3 access, create a single group and a single user +with full access to all buckets. Full access to all buckets is insecure! + +Create an +[Amazon IAM group](http://docs.aws.amazon.com/IAM/latest/UserGuide/Using_CreatingAndListingGroups.html) +with full access to the S3 bucket. Select the "Amazon S3 Full Accesss" Policy +Template. + +Create an +[Amazon IAM user](http://docs.aws.amazon.com/IAM/latest/UserGuide/Using_SettingUpUser.html). +Copy the credentials into the `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY` +environment variables. Be sure to write down the access information, as it +will only be shown once. + +#### secure S3 access +For secure S3 access, two users will be needed. One with access to the +Filepicker bucket and one with access to the static hosting bucket. + +Note: this might need to be modified to prevent creation and deletion of +buckets? + +Create an +[Amazon IAM group](http://docs.aws.amazon.com/IAM/latest/UserGuide/Using_CreatingAndListingGroups.html) +with full access to the S3 bucket. The quick way is to select the +"Amazon S3 Full Accesss" Policy Template and replace `"Resource": "*"` with +`"Resource": "arn:aws:s3:::"`. + +Create an +[Amazon IAM user](http://docs.aws.amazon.com/IAM/latest/UserGuide/Using_SettingUpUser.html). +Copy the credentials into the `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY` +environment variables. Be sure to write down the access information, as it +will only be shown once. + +Ensure the created user is a member of the group with access to the S3 +static files bucket. + +Repeat the process again, creating a group for the Filepicker bucket and +creating a user with access to that group. These credentials will be passed +on to Filepicker. + +#### somewhat secure S3 access +Create two groups as described in the `secure S3 access` section above. + +Create a single user, save the credentials as described in the +`insecure S3 access` section above, and pass the credentials on to Filepicker. + +Add the single user to both groups. + +This is less secure because if your web server or Filepicker get compromised +(so there are two points for potential failure), the single compromised +user has full access to both buckets. + +### Amazon Cloudfront CDN +[Cloudfront CDN](http://aws.amazon.com/cloudfront/) assists static file hosting. + +Follow +[Amazon's instructions](http://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/GettingStarted.html) +to host static files out of the appropriate S3 bucket. Note that Django's static +file upload process has been modified to mark static files as publicly +assessible. + +In the settings for the Cloudfront Distribution, copy the "Domain Name" from +General settings and set `CLOUDFRONT_DOMAIN` to it. For example, `abcdefghij.cloudfront.net`. + +### Amazon Mechanical Turk +Mechanical turk is employed to generate human feedback from uploaded notes. +This service is helpful for generating flash cards and quizzes. -If you need to setup the project for development, it is highly recommend that -you grab an existing development virtual machine for create one yourself, and -then configure the virtual machine for production with the steps shown in the -next section. Instructions for creating a virtual machine follow: +This service is optional and it might cause unexpected charges when +deployed. If the required environment variable is not found, +then no errors will occur and no mechanical turk tasks will be created, avoiding any unexpected +costs. + +The `MTURK_HOST` environment variable is almost certainly +`"mechanicalturk.amazonaws.com"`. + +The code will create and publish HITs on your behalf. + +### Google Drive +This software uses [Google Drive](https://developers.google.com/drive/) to +convert documents to and from various file formats. + +A Google Drive service account with access to the Google Drive is required. +This may be done with a Google Apps account with administrative privileges, or ask +your business sysadmin. + +Follow [Google's instructions](https://developers.google.com/drive/web/auth/web-server) +to create a Google Drive service account. If using Google Apps, it is worth +looking at [these instructions](https://developers.google.com/drive/delegation). + +Populate the `GOOGLE_USER` environment variable with the email address of the +user whose Google Drive will be accessed. This is typically your own email +address. + +Google Drive used to use p12 files by default. Now a new-style JSON file is +downloaded by default when creating new credentials. Until the code has been +[updated](https://github.com/FinalsClub/karmaworld/issues/437) to use the +new-style JSON file, make sure to click the `Generate a new P12 key` button. + +While on the Credentials page (with the `Generate a new P12 key` button +visible), note the Service account Email address. It will have a format like +`numbers-alphanumerics@developer.gserviceaccount.com`. Copy this value and +paste it into the `GOOGLE_SERVICE_EMAIL` environment variable. + +Convert the p12 file into a Base64 encoded string for the +`GOOGLE_SERVICE_KEY_BASE64` environment variable. There are many ways to do +this. If Python is available, the +[binascii library](https://docs.python.org/2/library/binascii.html#binascii.b2a_base64) +makes this very easy: + + import binascii + with open('file.p12', 'r') as f: + print binascii.b2a_base64(f.read) -1. Install (VirtualBox)[http://www.virtualbox.com/] +### Filepicker +This software uses [Filepicker](https://www.filepicker.com/) for uploading +files. This requires an account with Filepicker. -1. Install (vagrant)[http://www.vagrantup.com/] +Filepicker can use an additional third party file hosting site where it may +send uploaded files. This project, in production, uses Amazon S3 as the third +party. See the Amazon S3 section above for more information. + +In development, an S3 bucket will not be necessary. The Free Plan should +suffice. -1. Use Vagrant to create the virtual machine. - * While in `cd {project_root}`, type `vagrant up` +Create a new App with Web SDK and provide the Heroku App URL for the +Application's URL. You'll be given an API Key for the App. Paste this into the +`FILEPICKER_API_KEY` environment variable. -# Production Install +Find the 'App Security' button on the left hand side of the web site. Make sure +'Use Security' is enabled. Generate a new app secret. It might require +reloading the page to see the new secret. Paste this secret into the +`FILEPICKER_SECRET` environment variable. -If you're starting to work on this project and you need it setup for production, -follow the steps below. +If you have an upgraded plan, you can configure Filepicker to have access to +your Filepicker S3 bucket. Click 'Amazon S3' on the left hand side menu and +supply the credentials for the user with access to the Filepicker S3 bucket. -1. Ensure the following are installed: - * `git` - * `PostgreSQL` (server and client) - * `Python` - * `PIP` - * `virtualenv` and `virtualenvwrapper` +### IndexDen +KarmaWorld uses IndexDen to create a searchable index of all the notes in the +system. Create an free IndexDen account at +[their homepage](http://indexden.com/). You will be given a private URL that +accesses your IndexDen account. This URL is visible on your dashboard (you +might need to scroll down). -1. Generate a PostgreSQL database and a role with read/write permissions. - * For Debian, these instructions are helpful: https://wiki.debian.org/PostgreSql +Set the `INDEXDEN_PRIVATE_URL` environment variable to your private URL. + +Set the `INDEXDEN_INDEX` environment variable to the name of the index you want +to use for KarmaWorld. The index will be created automatically when KarmaNotes +is run if it doesn't already exist. It may be created through the GUI if +desired. + +### Twitter + +Twitter is used to post updates about new courses. Access to the Twitter API +will be required for this task. + +If this Twitter feature is desired, the consumer key and secret as well as the +access token key and secret are needed by the software. + +If the required environment variables are not found, then no errors will occur +and no tweets will be posted. + +To set this up, +[create a new Twitter application](https://dev.twitter.com/apps/new). +Use your Heroku App URL for the website field. Leave the Callback field blank. + +Make sure this application has read/write access. Generate an access token. Go +to your OAuth settings, and grab the "Consumer key", "Consumer secret", +"Access token", and "Access token secret". Paste these, respectively, into the +environment variables `TWITTER_CONSUMER_KEY`, `TWITTER_CONSUMER_SECRET`, +`TWITTER_ACCESS_TOKEN_KEY`, `TWITTER_ACCESS_TOKEN_SECRET`. + +# Local + +## Configuring foreman + +KarmaNotes runs on Heroku as a webapp and thus makes use of a Procfile. While +not strictly necessary, KarmaWorld can use the same basic Procfile which is +convenient and consistent. + +To use the Procfile locally, we recommend using `foreman`. To install `foreman` +and other Heroku tools, install the +[Heroku toolbelt](https://toolbelt.heroku.com/). -1. Modify configuration files. - * There are settings in `{project_root}/karmaworld/settings/dev.py` - * There are additional configuration options for external dependencies - under `{project_root}/karmaworld/secret/`. - * Copy files with the example extension to the corresponding filename - without the example extension (e.g. - `cp filepicker.py.example filepicker.py`) - * Modify those files. - * Ensure `PROD_DB_USERNAME`, `PROD_DB_PASSWORD`, and `PROD_DB_NAME` - inside `db_settings.py` match the role, password, and database - generated in the previous step. - * Ensure *.py in `secret/` are never added to the git repo. (.gitignore - should help warn against taking this action) +Ensure environment variables are available to `foreman` by copying +`.env.example` to `.env` and update those variables as appropriate for your +local system. -1. Make sure that you're in the root of the project that you just cloned and - run +## pdf2htmlEX - fab here first_deploy +This project uses [pdf2htmlEX](https://github.com/coolwanglu/pdf2htmlEX) as +a dependency. pdf2htmlEX is used to convert uploaded PDF notes into HTML. It +needs to be installed on the same system that KarmaWorld is running on. - This will make a virtualenv, install the development dependencies and create - the database tables. +### using their source -1. Now you can run ``./manage.py runserver`` and visit the site in the browser. +See their instructions at +[https://github.com/coolwanglu/pdf2htmlEX/wiki/Building](https://github.com/coolwanglu/pdf2htmlEX/wiki/Building). -# Accessing the Vagrant Virtual Machine +Make sure to [patch](https://github.com/FinalsClub/pdf2htmlEX/commit/3c19f6abd8d59d1b58cf254b7160b332b3f5b517) +the source code to expose two variables. -## Connecting to the VM via SSH -If you have installed a virtual machine using `vagrant up`, you can connect -to it by running `vagrant ssh` from `{project_root}`. +### using our fork -## Updating the VM code repository -Once connected to the virtual machine by SSH, you will see `karmaworld` in -the home directory. That is the `{project_root}` in the virtual machine. +You can use FinalsClub's [outdated version of pdf2htmlEX](https://github.com/FinalsClub/pdf2htmlEX). +See their installation instructions above, but don't worry about patching. -`cd karmaworld` and then use `git fetch; git merge` and/or `git pull origin` as +### using their PPA + +You can use [their upstream PPA](https://launchpad.net/~coolwanglu/+archive/ubuntu/pdf2htmlex). + + apt-add-repository ppa:coolwanglu/pdf2htmlex + apt-get update + apt-get install pdf2htmlex + +Then patch the javascript on your system by running this code in the shell. + + cat >> `dpkg -L pdf2htmlex | grep pdf2htmlEX.js` <`. For running any other +`manage.py` commands, you should also precede them with `foreman run` like just shown. +This simply ensures that the environment variables from `.env` are present. + +# Heroku Install + +KarmaNotes runs on Heroku as a webapp. This section addresses what was done +for KarmaNotes so that other implementations of KarmaWorld can be run on +Heroku. + +Before anything else, download the [Heroku toolbelt](https://toolbelt.heroku.com/). + +To run KarmaWorld on Heroku, do `heroku create` and `git push heroku master` as typical +for a Heroku application. Set your the variable `BUILDPACK_URL` to +`https://github.com/FinalsClub/heroku-buildpack-karmanotes` to use a buildpack +designed to support KarmaNotes. + +You will need to import the US Department of Education's list of accredited schools. + 1. Fetch USDE schools with + `heroku run python manage.py fetch_usde_csv ./schools.csv` + 1. Upload the schools into the database with + `heroku run python /manage.py import_usde_csv ./schools.csv` + 1. Clean up redundant information with + `heroku run python /manage.py sanitize_usde_schools` + + +# Django Database management + +## South + +We have setup Django to use +[south](http://south.aeracode.org/wiki/QuickStartGuide) for migrations. When +changing models, it is important to run +`foreman run python manage.py schemamigration` which will create a migration + to reflect the model changes into the database. These changes can be pulled +into the database with `foreman run python manage.py migrate`. + +Sometimes the database already has a migration performed on it, but that +information wasn't told to south. There are subtleties to the process which +require looking at the south docs. As a tip, start by looking at the `--fake` +flag. + +# Assets from Third Parties + +A number of assets have been added to the repository which come from external +sources. It would be difficult to keep a complete list in this README and keep +it up to date. Software which originally came from outside parties can +generally be found in `karmaworld/assets`. -This may seem like duplication. It is. The duplication allows your host machine -to maintain git credentials and manage repository access control so that your -virtual machine doesn't need sensitive information. Your virtual machine simply -pulls from the local repository on your local file system without needing -credentials, etc. +Additionally, all third party Python projects (downloaded and installed with +pip) are listed in these files: -## Other Vagrant commands -Please see (vagrant documentation)[http://docs.vagrantup.com/v2/cli/index.html] -for more information on how to use the vagrant CLI to manage your development -VM. +* `requirements.txt` +* `requirements-dev.txt` -Thanks -====== +# Thanks * KarmaNotes.org is a project of the FinalsClub Foundation with generous funding from the William and Flora Hewlett Foundation