In simplest worlds, It describes the world to the blind. It takes video as input and describes the world to the blind.
lack of gpu for efficient training of model.
lack of good quality of data for efficient training.
lot's of bugs encountered but all fixed.