Twitter Crawler
Introduction
This project is developed for crawling public users’ timeline with two different approaches:
- Scrapy
- Selenium + Beautiful soup
then the crawled data is writen into a file in json format. The kafka producer then reads the file and produce its data into ‘raw-tweets’ topic.
Technologies/Languages Used
Technology | Usage |
---|---|
![]() |
Python is the languages used for projects. |
![]() |
SCRAPY is used for crawling data from twitter. |
![]() |
SELENIUM is a standalone browser. |
![]() |
BEAUTIFUL SOUP is used for parsing HTML and XML documents. |
![]() |
Docker is used for virtualization and containerizing services, including backend and frontend services. |
![]() |
Kafka Kafka is used to produce data |
![]() |
Git is used for version control. |
Project information
- Category: Software
- Project date: December 2020
Project Description
This project crawls twitter's timeline and produces crawled data into kafka topic.