Ever wondered if Twitter metrics like follower counts and user IDs follow a natural pattern? This project explores whether Benfordโs Lawโa fascinating statistical ruleโapplies to numerical data from Twitter.
๐งฎ Benfordโs Law states that in many naturally occurring datasets, lower digits (especially '1') are more likely to appear as the first digit than higher ones (like '9'). This project examines whether Twitter's data aligns with this distribution.
To analyze whether Twitter metricsโsuch as follower counts, friend counts, and user IDsโfollow Benfordโs Law.
This can help in:
- Detecting anomalies or suspicious patterns
- Assessing data authenticity
- Differentiating between organic and manipulated growth
- ๐น Mock Twitter Data
Fields analyzed:
- ๐ฅ Follower Count
- ๐ค Friend Count
- ๐ User ID
-
Data Cleaning
- Removed null, zero, and irrelevant values.
-
Digit Extraction
- Extracted the leading digit from each numeric field.
-
Distribution Comparison
- Calculated the actual frequency of leading digits
- Compared with the expected Benford distribution
-
Statistical Analysis
- Conducted a Chi-Square Goodness-of-Fit Test
- Visualized actual vs expected digit frequencies
- ๐ Line plots and bar graphs for digit frequencies
- ๐ Visual comparison between actual vs expected Benford distribution
- ๐ Separate analysis for different fields (followers, friends, IDs)
- ๐จ Extendable for anomaly detection
- ๐ Python
- ๐ฆ Pandas, NumPy
- ๐ Matplotlib, Seaborn (visualizations)
- ๐ SciPy (statistical testing)
- โ Data Integrity Checks: Detect metric manipulation or spammy activity
- ๐ Behavioral Insights: Analyze trends in natural vs. inorganic growth
- ๐ Educational Tool: Real-world demonstration of Benfordโs Law