Twitter Crawler

Category: Software
Project date: December 2020

This project is developed for crawling public users’ timeline with two different approaches:

Scrapy
Selenium + Beautiful soup
then the crawled data is writen into a file in json format. The kafka producer then reads the file and produce its data into ‘raw-tweets’ topic.

Technology	Usage
	Python is the languages used for projects.
	SCRAPY is used for crawling data from twitter.
	SELENIUM is a standalone browser.
	BEAUTIFUL SOUP is used for parsing HTML and XML documents.
	Docker is used for virtualization and containerizing services, including backend and frontend services.
	Kafka Kafka is used to produce data
	Git is used for version control.

Project Description
This project crawls twitter's timeline and produces crawled data into kafka topic.