COVID SHORTS
A Serverless News Summariser for Covid 19 related news from Indian States and Countries around the world. Your one stop point for top covid related news in a concise manner.
Created on 26th March 2020
•
COVID SHORTS
A Serverless News Summariser for Covid 19 related news from Indian States and Countries around the world. Your one stop point for top covid related news in a concise manner.
The problem COVID SHORTS solves
With the recent pandemic outbreak covid 19 related news is being thrown left, right and centre. To keep you updated
COVID SHORTS brings to you only top and relevant stories with an important summary of the article. The Application has
a Serverless Architechture and has its whole infrastructure on AWS Cloud making it easily Scalable and Fast! The Application
can also detect various languages so if there is a top story in a regional language then it is also shown in your feed. The UI is pretty minimal and easy to use. As of now top 5 articles(if there are 5 relevant news) are shown but this is not a hard limit and
can be easily managed in future versions if any.
Challenges I ran into
There were many but restless debugging were able to solve all of them :-
- While creating deployment environment for an AWS Lambda function using AWS Lambda Layers the environment
should be created in Ubuntu environment because AWS Lambda uses Ubuntu Snapshot, The Deployment directory should also be in a particular format. I am on windows so there were issues.
Solution : Make the environment in an EC2 Ubuntu AMI to avoid any errors. A typical directory will look like
python -> lib -> python3.6 -> site-packages
pip install all dependencies inside site-packages and zip the python folder.
- The project uses the nltk library , to load 'punkt' module one needs to do a nltk.download('punkt') however AWS Lambda
looks for a particular directory for the 'punkt' model which will mostly be different if you did a simple pip3 install nltk while
creating the deployment environment
Solution : A workaround courtesy StackOverflow is to use tmp directory which is writable, so try this after importing nltk
from nltk.tokenize import word_tokenize
nltk.data.path.append("/tmp")
and then do
nltk.download("punkt",download_dir="/tmp")
This should work :)
There were many little challenges on the way but these hurdles took a substantial amount of time.