C

Coursearch

A one stop solution to navigate the endless sea of online courses.

The problem Coursearch solves

Under the Open Innovation Track, we have chosen to create a MOOC crawling web application that crawls Coursera, Udemy, Udacity and Pluralsight to mine information. Given any search term (eg. machine learning, philosophy, graphic design), the application scrapes data about any courses on these platforms matching the search criteria. It then combines the number of reviews and the rating of the courses to give a unique ranking to each item. This information is then combined in such a way that the user can effectively select a course from all these platforms that ranks the best. They can also sort, search and pick within the list.

As an added option, we will also offer an API microservice with some endpoints defined in our documentation. Using this, anyone can pass in search parameters which returns a consolidated list of crawled data from all different platforms.

Challenges we ran into

  1. Creating web crawlers for dynamic sites like Pluralsight was very hard - we had to parse the page into xml and then extract data from it!

  2. We created a web crawler for Udacity but were not able to have it fully working. It remains in the future scope of this project.

  3. We were unable to make multiple spiders work when the app was used again and again - after hours of hard work, we figured out that using Crochet would solve the issue!

  4. We needed a chromedriver for the crawlers to work, but had no idea how to deploy it to Heroku. We solved it using heroku-chromedriver!

  5. The datatable used for rendering data had trouble resizing! We solved it using manual css breakpoints!

Discussion