README.md

   1 # KarmaWorld
   2 __Description__: A django application for sharing and uploading class notes.
   3
   4 __Copyright__: FinalsClub, a 501c3 non-profit organization
   5
   6 __License__: GPLv3 except where otherwise noted
   7
   8 __Contact__: info@karmanotes.org
   9
  10 v3.0 of the karmanotes.org website from the FinalsClub Foundation
  11
  12 # Purpose
  13
  14 KarmaWorld is an online database of college lecture notes.  KarmaWorld
  15 empowers college students to participate in the free exchange of knowledge.
  16
  17 # Naming
  18
  19 The repository and the project are called KarmaWorld. One implementation
  20 of KarmaWorld, which is run by FinalsClub Foundation, is called
  21 [KarmaNotes](https://www.karmanotes.org/).
  22
  23 # Pre-Installation
  24
  25 ## Code
  26
  27 Before doing anything, you'll need the code. Grab it from github.
  28
  29 Clone the project from the central repo using your github account:
  30
  31     git clone git@github.com:FinalsClub/karmaworld.git
  32
  33 If you aren't using a system setup for github, then grab the project with
  34 this command instead:
  35
  36     git clone https://github.com/FinalsClub/karmaworld.git
  37
  38 Generally speaking, this will create a subdirectory called `karmaworld` under
  39 the directory where the `git` command was run. This git repository directory
  40 will be referred to herein as `{project_root}`.
  41
  42 There might be some confusion as the git repository's directory will likely be
  43 called `karmaworld` (this is `{project_root}`), but there is also a `karmaworld`
  44 directory underneath that (`{project_root}/karmaworld`) alongside files like
  45 `fabfile.py` (`{project_root}/fabfile.py`) and `README.md`
  46 (`{project_root}/README.md`).
  47
  48 ## External Software Dependencies
  49
  50 ### pdf2htmlEX
  51
  52 KarmaWorld uses [pdf2htmlEX](https://github.com/coolwanglu/pdf2htmlEX) as
  53 a dependency. pdf2htmlEX is used to convert uploaded PDF notes into HTML.
  54
  55 An [outdated version of pdf2htmlEX](https://github.com/FinalsClub/pdf2htmlEX)
  56 is available which includes the
  57 [patch](https://github.com/FinalsClub/pdf2htmlEX/commit/3c19f6abd8d59d1b58cf254b7160b332b3f5b517)
  58 required for pdf2htmlEX to correctly work with KarmaWorld.
  59
  60 Newer versions can be used by applying the patch by hand. It's a fairly
  61 simple two-line modification that can be done after installing
  62 pdf2htmlEX.
  63
  64 ### SSL Certificate
  65
  66 If you wish to host your system publicly, you'll almost certainly want
  67 an SSL certificate signed by a proper authority.
  68
  69 You may need to set the `SSL_REDIRECT` environment variable to `true` to
  70 make KarmaWorld redirect insecure connections to secure ones.
  71
  72 Follow [Heroku's SSL setup](https://devcenter.heroku.com/articles/ssl-endpoint)
  73 to get SSL running on your server with Heroku.
  74
  75 ## External Service Dependencies
  76
  77 Notice: A number of services are required even if running the KarmaWorld web
  78 service [locally](#local). Some of the services are recommended, and some are
  79 completely optional even if running the web service on Heroku.
  80
  81 This software makes use of external third party services which require
  82 accounts to access the service APIs. Without these third parties available,
  83 this software may require considerable overhaul. These services have API keys,
  84 credentials, and other information that you must provide to KarmaWorld
  85 as environment variables.
  86
  87 The best way to persist these API keys in environment variables is by using a
  88 `.env` file.  Copy `.env.example` to `.env` and populate the fields as required.
  89
  90 Many of these services have free tiers and can be used without charge for
  91 development testing purposes.
  92
  93 * Reminder
  94   * Copy `.env.example` to `.env` and populate the environment variables there.
  95 * Required Services
  96   * [Google Drive](#google-drive)
  97   * [Filepicker](#filepicker)
  98   * [PostgreSQL](#postgresql)
  99   * [Celery](#celery-queue)
 100 * Optional but recommended
 101   * [IndexDen](#indexden): enables searching through courses, notes, etc
 102   * [Heroku](#heroku): the production environment used by karmanotes.org
 103     * it might not be possible to run KarmaWorld on Heroku using a free
 104       webapp.
 105   * [Amazon S3](#s3-for-static-files): for static file hosting
 106 * Entirely optional (though used in the production environment)
 107   * [Twitter](#twitter): share updates about new uploads
 108   * [Amazon Mechanical Turk](#amazon-mechanical-turk): generate quizzes, flashcards, etc
 109   * [Amazon CloudFront](#amazon-cloudfront-cdn)
 110   * [Amazon S3](#s3-for-filepicker): store files uploaded to Filepicker
 111     * Filepicker does not support S3 storage in its free tier
 112
 113 ### Heroku
 114 This project has chosen to use [Heroku](www.heroku.com) to host the Django and
 115 celery software. While not a hard requirement, the more up-to-date parts of this
 116 documentation will operate assuming Heroku is in use.
 117
 118 See README.heroku for more information.
 119
 120 #### pdf2htmlEX on Heroku
 121 If using Heroku, the default
 122 [KarmaNotes Heroku buildpack](https://github.com/FinalsClub/heroku-buildpack-karmanotes)
 123 will [include](https://github.com/FinalsClub/heroku-buildpack-karmanotes/blob/master/bin/steps/pdf2htmlex)
 124 the [required version of pdf2htmlEX](#pdf2htmlex).
 125
 126 ### Celery Queue
 127 Celery uses the Apache Message Queueing Protocol for passing messages to its workers.
 128
 129 For production, we recommend using Heroku's CloudAMQP add-on, getting your own CloudAMQP account, or
 130 running a queueing system on your own. The `CLOUDAMQP_URL` environment variable must be set correctly
 131 for KarmaWorld to be able to use Celery. The `CELERY_QUEUE_NAME` environment variable
 132 must be set to the name of the queue you wish to use. Settings this to something unique
 133 allows multiple instances of KarmaWorld (or some other software) to share the same queueing server.
 134
 135 For development on localhost, `RabbitMQ` is the default for `djcelery` and is well supported. Ensure
 136 `RabbitMQ` is installed for local development.
 137
 138 ### PostgreSQL
 139
 140 PostgreSQL is not necessarily required; other RDBMS could probably be fit into
 141 place. However, the code was largely written assuming PostgreSQL will be used.
 142 Change to another system with the caveat that it might take some work.
 143
 144 There are many cloud providers which provide PostgreSQL databases. Heroku has
 145 an add-on for providing a PostgreSQL database. Ensure something like this
 146 is made available and installed to the app.
 147
 148 For local development, ensure a PostgreSQL is running on localhost or is
 149 otherwise accessible.
 150
 151 ### Amazon S3
 152 The instructions for creating an [S3](http://aws.amazon.com/s3/) bucket may be
 153 [found on Amazon.](http://docs.aws.amazon.com/AmazonS3/latest/gsg/CreatingABucket.html)
 154
 155 Two, separate buckets may be used in production: one for static file hosting
 156 and one as a communication bus with Filepicker.
 157
 158 #### S3 for Filepicker
 159
 160 This software uses S3 to store files which are sent to or received
 161 from Filepicker. Filepicker will need to know the S3 bucket name, access key,
 162 and secret key.
 163
 164 Filepicker users can only make use of an S3 bucket with a paid account. For
 165 development purposes, no Filepicker S3 bucket is needed. Skip all references to
 166 the Filepicker S3 bucket in the development case.
 167
 168 The software will not need to know the S3 credentials for the Filepicker
 169 bucket, because the software will upload files to the Filepicker S3 bucket
 170 through Filepicker's API and it will link to or download files from the
 171 Filepicker S3 bucket through Filepicker's URLs. This will be covered in the
 172 [Filepicker section](#filepicker).
 173
 174 #### S3 for static files
 175
 176 This software uses S3 for hosting static files. The software will need to
 177 update static files on the S3 bucket. As such, the software will need the
 178 S3 bucket name, access key, and secret key via the environment variables. This
 179 is described in subsections below.
 180
 181 To support static hosting, `DEFAULT_FILE_STORAGE` should be set to
 182 `'storages.backends.s3boto.S3BotoStorage'`, unless there is a compelling reason
 183 to change it.
 184
 185 There are three ways to setup access to the S3 buckets depending upon speed
 186 and security. The more secure, the slower it will be to setup.
 187
 188 #### insecure S3 access
 189 For quick and dirty insecure S3 access, create a single group and a single user
 190 with full access to all buckets. Full access to all buckets is insecure!
 191
 192 Create an
 193 [Amazon IAM group](http://docs.aws.amazon.com/IAM/latest/UserGuide/Using_CreatingAndListingGroups.html)
 194 with full access to the S3 bucket. Select the "Amazon S3 Full Accesss" Policy
 195 Template.
 196
 197 Create an
 198 [Amazon IAM user](http://docs.aws.amazon.com/IAM/latest/UserGuide/Using_SettingUpUser.html).
 199 Copy the credentials into the `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY`
 200 environment variables. Be sure to write down the access information, as it
 201 will only be shown once.
 202
 203 #### secure S3 access
 204 For secure S3 access, two users will be needed. One with access to the
 205 Filepicker bucket and one with access to the static hosting bucket.
 206
 207 Note: this might need to be modified to prevent creation and deletion of
 208 buckets?
 209
 210 Create an
 211 [Amazon IAM group](http://docs.aws.amazon.com/IAM/latest/UserGuide/Using_CreatingAndListingGroups.html)
 212 with full access to the S3 bucket. The quick way is to select the
 213 "Amazon S3 Full Accesss" Policy Template and replace `"Resource": "*"` with
 214 `"Resource": "arn:aws:s3:::<static_bucket_name>"`.
 215
 216 Create an
 217 [Amazon IAM user](http://docs.aws.amazon.com/IAM/latest/UserGuide/Using_SettingUpUser.html).
 218 Copy the credentials into the `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY`
 219 environment variables. Be sure to write down the access information, as it
 220 will only be shown once.
 221
 222 Ensure the created user is a member of the group with access to the S3
 223 static files bucket.
 224
 225 Repeat the process again, creating a group for the Filepicker bucket and
 226 creating a user with access to that group. These credentials will be passed
 227 on to Filepicker.
 228
 229 #### somewhat secure S3 access
 230 Create two groups as described in the `secure S3 access` section above.
 231
 232 Create a single user, save the credentials as described in the
 233 `insecure S3 access` section above, and pass the credentials on to Filepicker.
 234
 235 Add the single user to both groups.
 236
 237 This is less secure because if your web server or Filepicker get compromised
 238 (so there are two points for potential failure), the single compromised
 239 user has full access to both buckets.
 240
 241 ### Amazon Cloudfront CDN
 242 [Cloudfront CDN](http://aws.amazon.com/cloudfront/) assists static file hosting.
 243
 244 Follow
 245 [Amazon's instructions](http://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/GettingStarted.html)
 246 to host static files out of the appropriate S3 bucket. Note that Django's static
 247 file upload process has been modified to mark static files as publicly
 248 assessible.
 249
 250 In the settings for the Cloudfront Distribution, copy the "Domain Name" from
 251 General settings and set `CLOUDFRONT_DOMAIN` to it. For example, `abcdefghij.cloudfront.net`.
 252
 253 ### Amazon Mechanical Turk
 254 Mechanical turk is employed to generate human feedback from uploaded notes.
 255 This service is helpful for generating flash cards and quizzes.
 256
 257 This service is optional and it might cause unexpected charges when
 258 deployed.  If the required environment variable is not found,
 259 then no errors will occur and no mechanical turk tasks will be created, avoiding any unexpected
 260 costs.
 261
 262 The `MTURK_HOST` environment variable is almost certainly
 263 `"mechanicalturk.amazonaws.com"`.
 264
 265 The code will create and publish HITs on your behalf.
 266
 267 ### Google Drive
 268 This software uses [Google Drive](https://developers.google.com/drive/) to
 269 convert documents to and from various file formats.
 270
 271 A Google Drive service account with access to the Google Drive is required.
 272 This may be done with a Google Apps account with administrative privileges, or ask
 273 your business sysadmin.
 274
 275 Follow [Google's instructions](https://developers.google.com/drive/web/auth/web-server)
 276 to create a Google Drive service account. If using Google Apps, it is worth
 277 looking at [these instructions](https://developers.google.com/drive/delegation).
 278
 279 Populate the `GOOGLE_USER` environment variable with the email address of the
 280 user whose Google Drive will be accessed. This is typically your own email
 281 address.
 282
 283 Google Drive used to use p12 files by default. Now a new-style JSON file is
 284 downloaded by default when creating new credentials. Until the code has been
 285 [updated](https://github.com/FinalsClub/karmaworld/issues/437) to use the
 286 new-style JSON file, make sure to click the `Generate a new P12 key` button.
 287
 288 While on the Credentials page (with the `Generate a new P12 key` button
 289 visible), note the Service account Email address. It will have a format like
 290 `numbers-alphanumerics@developer.gserviceaccount.com`. Copy this value and
 291 paste it into the `GOOGLE_SERVICE_EMAIL` environment variable.
 292
 293 Convert the p12 file into a Base64 encoded string for the
 294 `GOOGLE_SERVICE_KEY_BASE64` environment variable. There are many ways to do
 295 this. If Python is available, the
 296 [binascii library](https://docs.python.org/2/library/binascii.html#binascii.b2a_base64)
 297 makes this very easy:
 298
 299         import binascii
 300         with open('file.p12', 'r') as f:
 301             print binascii.b2a_base64(f.read)
 302
 303 ### Filepicker
 304 This software uses [Filepicker](https://www.filepicker.com/) for uploading
 305 files. This requires an account with Filepicker.
 306
 307 Filepicker can use an additional third party file hosting site where it may
 308 send uploaded files. This project, in production, uses Amazon S3 as the third
 309 party. See the Amazon S3 section above for more information.
 310
 311 In development, an S3 bucket will not be necessary. The Free Plan should
 312 suffice.
 313
 314 Create a new App with Web SDK and provide the Heroku App URL for the
 315 Application's URL. You'll be given an API Key for the App. Paste this into the
 316 `FILEPICKER_API_KEY` environment variable.
 317
 318 Find the 'App Security' button on the left hand side of the web site. Make sure
 319 'Use Security' is enabled. Generate a new app secret. It might require
 320 reloading the page to see the new secret. Paste this secret into the
 321 `FILEPICKER_SECRET` environment variable.
 322
 323 If you have an upgraded plan, you can configure Filepicker to have access to
 324 your Filepicker S3 bucket. Click 'Amazon S3' on the left hand side menu and
 325 supply the credentials for the user with access to the Filepicker S3 bucket.
 326
 327 ### IndexDen
 328 KarmaWorld uses IndexDen to create a searchable index of all the notes in the
 329 system. Create an free IndexDen account at
 330 [their homepage](http://indexden.com/). You will be given a private URL that
 331 accesses your IndexDen account. This URL is visible on your dashboard (you
 332 might need to scroll down).
 333
 334 Set the `INDEXDEN_PRIVATE_URL` environment variable to your private URL.
 335
 336 Set the `INDEXDEN_INDEX` environment variable to the name of the index you want
 337 to use for KarmaWorld. The index will be created automatically when KarmaNotes
 338 is run if it doesn't already exist. It may be created through the GUI if
 339 desired.
 340
 341 ### Twitter
 342
 343 Twitter is used to post updates about new courses. Access to the Twitter API
 344 will be required for this task.
 345
 346 If this Twitter feature is desired, the consumer key and secret as well as the
 347 access token key and secret are needed by the software.
 348
 349 If the required environment variables are not found, then no errors will occur
 350 and no tweets will be posted.
 351
 352 To set this up,
 353 [create a new Twitter application](https://dev.twitter.com/apps/new).
 354 Use your Heroku App URL for the website field. Leave the Callback field blank.
 355
 356 Make sure this application has read/write access. Generate an access token. Go
 357 to your OAuth settings, and grab the "Consumer key", "Consumer secret",
 358 "Access token", and "Access token secret". Paste these, respectively, into the
 359 environment variables `TWITTER_CONSUMER_KEY`, `TWITTER_CONSUMER_SECRET`,
 360 `TWITTER_ACCESS_TOKEN_KEY`, `TWITTER_ACCESS_TOKEN_SECRET`.
 361
 362 # Local
 363
 364 ## Configuring foreman
 365
 366 KarmaNotes runs on Heroku as a webapp and thus makes use of a Procfile. While
 367 not strictly necessary, KarmaWorld can use the same basic Procfile which is
 368 convenient and consistent.
 369
 370 To use the Procfile locally, we recommend using `foreman`. To install `foreman`
 371 and other Heroku tools, install the
 372 [Heroku toolbelt](https://toolbelt.heroku.com/).
 373
 374 Ensure environment variables are available to `foreman` by copying
 375 `.env.example` to `.env` and update those variables as appropriate for your
 376 local system.
 377
 378 ## pdf2htmlEX
 379
 380 This project uses [pdf2htmlEX](https://github.com/coolwanglu/pdf2htmlEX) as
 381 a dependency. pdf2htmlEX is used to convert uploaded PDF notes into HTML. It
 382 needs to be installed on the same system that KarmaWorld is running on.
 383
 384 ### using their source
 385
 386 See their instructions at
 387 [https://github.com/coolwanglu/pdf2htmlEX/wiki/Building](https://github.com/coolwanglu/pdf2htmlEX/wiki/Building).
 388
 389 Make sure to [patch](https://github.com/FinalsClub/pdf2htmlEX/commit/3c19f6abd8d59d1b58cf254b7160b332b3f5b517)
 390 the source code to expose two variables.
 391
 392 ### using our fork
 393
 394 You can use FinalsClub's [outdated version of pdf2htmlEX](https://github.com/FinalsClub/pdf2htmlEX).
 395 See their installation instructions above, but don't worry about patching.
 396
 397 ### using their PPA
 398
 399 You can use [their upstream PPA](https://launchpad.net/~coolwanglu/+archive/ubuntu/pdf2htmlex).
 400
 401         apt-add-repository ppa:coolwanglu/pdf2htmlex
 402         apt-get update
 403         apt-get install pdf2htmlex
 404
 405 Then patch the javascript on your system by running this code in the shell.
 406
 407         cat >> `dpkg -L pdf2htmlex | grep pdf2htmlEX.js` <<PDF2HTMLEXHACK
 408         Viewer.prototype['rescale'] = Viewer.prototype.rescale;
 409         Viewer.prototype['scroll_to'] = Viewer.prototype.scroll_to;
 410         PDF2HTMLEXHACK
 411
 412 ## Install
 413
 414   1. `virtualenv venv`
 415   1. `source venv/bin/activate`
 416   1. `pip install -r requirements.txt`
 417     * on Debian systems, some packages are required for pip to succeed:
 418     * `apt-get install python-dev libpython-dev python-psycopg2 libmemcached-dev libffi-dev libssl-dev postgresql-server-dev-X.Y`libxml2-dev libxslt-dev
 419   1. `pip install -r requirements-dev.txt`
 420
 421 ## Configuration
 422
 423 Make sure [External Service Dependencies](#external-service-dependencies) are
 424 satisfied. This includes running a local database and RabbitMQ instance as
 425 desired.
 426
 427   1. configure `.env` as per [instructions](#external-service-dependencies)
 428   1. `foreman run python manage.py syncdb --migrate --noinput`
 429   1. `foreman run python manage.py createsuperuser`
 430   1. `foreman run python manage.py fetch_usde_csv ./schools.csv`
 431   1. `foreman run python manage.py import_usde_csv ./schools.csv`
 432   1. `foreman run python manage.py sanitize_usde_schools`
 433
 434 * `fetch_usde_csv` downloads school records and stores them to `./schools.csv`. This file name
 435      and location is arbitrary. As long as the same file is passed into `import_usde_csv` for
 436      reading, everything should be fine.
 437
 438 * `fetching_usde_csv` requires `7zip` to be installed for processing compressed
 439      archives. On Debian-based systems, this entails `apt-get install p7zip-full`
 440
 441 If using `DJANGO_SETTINGS_MODULE='karmaworld.settings.dev'` in `.env`, static
 442 file hosting should be done by local files.  `DEFAULT_FILE_STORAGE` should be
 443 set to `django.core.files.storage.FileSystemStorage`.
 444
 445 If using `DJANGO_SETTINGS_MODULE='karmaworld.settings.prod'` in `.env`, static
 446 file hosting is done by `DEFAULT_FILE_STORAGE` defined in `.env`.
 447
 448 ## Run
 449
 450 Make sure you are inside your virtual environment (`source venv/bin/activate`).
 451
 452 Static file hosting makes use of compression if using
 453 `DJANGO_SETTINGS_MODULE='karmaworld.settings.prod'`. If using compression and
 454 the code has changed or this is the first run, make sure any modified static
 455 files get compressed with `foreman run python manage.py compress`. Static files
 456 then need to be uploaded correctly with `foreman run python manage.py
 457 collectstatic`.
 458
 459 Run `foreman start`.  `foreman` will load the `.env` file and manage running all
 460 processes in a way that is similar to that of Heroku. This allows better
 461 consistency with local, staging, and production deployments.
 462
 463 To run web-only, but no celery or beat, run `foreman start web` to specify
 464 strictly the web worker.
 465
 466 Press ctrl-C to kill foreman. Foreman will run Django's runserver command.
 467 If you wish to have more control over how this is done, you can do
 468 `foreman run python manage.py runserver <options>`. For running any other
 469 `manage.py` commands, you should also precede them with `foreman run` like just shown.
 470 This simply ensures that the environment variables from `.env` are present.
 471
 472 # Heroku Install
 473
 474 KarmaNotes runs on Heroku as a webapp. This section addresses what was done
 475 for KarmaNotes so that other implementations of KarmaWorld can be run on
 476 Heroku.
 477
 478 Before anything else, download the [Heroku toolbelt](https://toolbelt.heroku.com/).
 479
 480 To run KarmaWorld on Heroku, do `heroku create` and `git push heroku master` as typical
 481 for a Heroku application. Set your the variable `BUILDPACK_URL` to
 482 `https://github.com/FinalsClub/heroku-buildpack-karmanotes` to use a buildpack
 483 designed to support KarmaNotes.
 484
 485 You will need to import the US Department of Education's list of accredited schools.
 486    1. Fetch USDE schools with
 487       `heroku run python manage.py fetch_usde_csv ./schools.csv`
 488    1. Upload the schools into the database with
 489       `heroku run python /manage.py import_usde_csv ./schools.csv`
 490    1. Clean up redundant information with
 491       `heroku run python /manage.py sanitize_usde_schools`
 492
 493
 494 # Django Database management
 495
 496 ## South
 497
 498 We have setup Django to use
 499 [south](http://south.aeracode.org/wiki/QuickStartGuide) for migrations. When
 500 changing models, it is important to run
 501 `foreman run python manage.py schemamigration` which will create a migration
 502  to reflect the model changes into the database. These changes can be pulled
 503 into the database with `foreman run python manage.py migrate`.
 504
 505 Sometimes the database already has a migration performed on it, but that
 506 information wasn't told to south. There are subtleties to the process which
 507 require looking at the south docs. As a tip, start by looking at the `--fake`
 508 flag.
 509
 510 # Assets from Third Parties
 511
 512 A number of assets have been added to the repository which come from external
 513 sources. It would be difficult to keep a complete list in this README and keep
 514 it up to date. Software which originally came from outside parties can
 515 generally be found in `karmaworld/assets`.
 516
 517 Additionally, all third party Python projects (downloaded and installed with
 518 pip) are listed in these files:
 519
 520 * `requirements.txt`
 521 * `requirements-dev.txt`
 522
 523 # Thanks
 524
 525 * KarmaNotes.org is a project of the FinalsClub Foundation with generous funding from the William and Flora Hewlett Foundation
 526
 527 * Also thanks to [rdegges](https://github.com/rdegges/django-skel) for the django-skel template