Post on 15-Apr-2017
Black, White, & BlueTeam CDTW
Members: Chris Smith, Dwayne Jones, Todd Rutherford, & Willie V. Ward III
Purpose & DisclaimerPurpose: The purpose of this presentation is to summarize the research and findings along with any associated subject manner. The information contained herein this presentation is intended to be informational in nature as it summarizes the avenue of research preformed by the researching group.
Disclaimer: The content contained within this presentation is not meant to offend any party within the physical or virtual audience nor is the offend any individual’s societal, cultural, or educational beliefs. Contained within the presentation are pictures & views that serve as antidotes for the views present in today’s society, which served as the foundation for this research.
Problem & Hypothesis
Problem Statement:
African-American males are involved in more fatal incidents with the events with the police than any other ethnicity.
Hypothesis Statement:The race of police officers serves as a precursor for the disproportionate amount of fatalities at the hands of police for African-American (race) males (gender) between the ages of 15-30 (age) below the poverty line (income) who are unarmed (weapon classification).
Data & Systems ArchitecturePath 1
Data Modeling & Visualization
Data Ingestion
Path One Description:The primary data set for this path are tweets from 20 Twitter user accounts the produce 4,000 data records ingested routinely at the time of a new event or at a minimum of 2 week intervals.
The data is ingested using the Tweepy package and then wrangled and stored in PostgreSQL. The final product of this data collection is shown visually using word cloud graphs from Python/Conda source code.
Path Two Description:The path includes structured datasets collected from various sources to produce snapshots for the variables analyzed supporting the team’s hypothesis. The data is stored using the Amazon Web Services (AWS) database, wrangled and stored in PostgreSQL then transposed using Tableau as the visualization tool.
20 Twitter User Accounts:• 15 celeb/advocate accounts• 5 News & Media accounts
Data Munching/Wrangling
Data Munching/Wrangling
Path 2
Data Architecture
Data Analysis: EncumbrancesPath One Problems
Twitter data was limited to a 200 tweet ingestion per pass.
Since the users produced activity on an array of subjects, this presented a collection issue.
Updated:May 5, 2016
Twitter Account Analysis (not listed in any specific order)
Tier Four Three Two OneTier Limits >300,000 followers <300,000 followers <100,000 followers <50,000 followersTier Count 4 3 5 3
User 1 User 2 User 3 User 4 User 5
Name Cornel West Tavis Smiley Dr. Umar Johnson Rev. Al Sharpton Khym RinggoldTwitter Name @CornelWest @tavissmiley @DrUmarJohnson @TheRevAl @Login2truth
Verified Account Yes Yes No Yes NoFollowers 750k 318k 46k 489k 10.7k
Tweets 3k 12.5k 6.5k 11.2k 8kTier
Bio Keywords Public Intellectuals Advocate Pan-Africanist White Supremacy White SupremacyBio KeyWords Racial Justice Entrepreneur Political Scientist Pan-Africanist Pan-Africanist Bio KeyWords Progressive Politics Pres. Of Team Pan-Afrikan Injustice Injustice
User 6 User 7 User 8 User 9 User 10
Name Jamilah Lemieux Johnetta Elzie Jeffrey Wright Deray McKesson Marc Lamont HillTwitter Name @JamilahLemieux @Nettaaaaaaaa @jfreewright @deray @marclamonthill
Verified Account Yes Yes Yes Yes YesFollowers 82k 122k 88.4k 345k 224k
Tweets 186k 175k 14.3k 159k 57kTier
Bio Keywords Senior Editor, Ebony Soldier Chicken foot Activist MorehouseBio KeyWords Howard University War Kook Educator Colored BoysBio KeyWords Stone Builder Actor Mayoral Candidate KAY
User 11 User 12 User 13 User 14 User 15
Name Black Lives Matter W. Kamau Bell #JusticeForDeriante Michael Eric Dyson Malcolm-Jamal WarnerTwitter Name @Blklivesmatter @wkamaubell @SankofaBrown @MichaelEDyson @MalcolmJamalWar
Verified Account No Yes No Yes YesFollowers 110k 77.9k 48.8k 261.7 265k
Tweets 8k 25k 139k 15.2k 5.1kTier
Bio Keywords Affirmation CNN Socialist Georgetown Professor Fucks to GiveBio KeyWords Resistance Host US Shades of America Militant Political Analyst Poet/WriterBio KeyWords Resilience Pan-Africanist Author Actor/Musician/Director
User 16 User 17 User 18 User 19 User 20
Name Fox News CNN BET MSNBC NAACPTwitter Name @FoxNews @CNN @BET @MSNBC @NAACP
Verified Account Yes Yes Yes Yes YesFollowers 9.1M 25M 1.85M 966k 136k
Tweets 249k 86k 70k 98k 15.5Tier Elite Elite Elite Elite Elite
Bio Keywords America’s Strongest #GoThere #ChasingDestinyBET Political Commentary Civil RightBio KeyWords Insightful Analysis Difficult Stories #BlackGirlsRock Informed Perspectives Grass RootsBio KeyWords Breaking News
Data Analysis: EncumbrancesPath Two
Collections of disparate dataset helped the group analyze different variables as associated with the hypothesis, but a lack of complete, related, and comprehensive datasets stymied the project’s statistical analysis.
Data VisualizationsPath One
147 Samples
Decision Tree: The race of police officers serves as a precursor for the disproportionate amount of fatalities at the hands of police for African-American (race) males (gender)
Scikit Learn Models evaluated• Linear SVC mode• Logistic Regression • Decision Tree Classifier
The decision tree model produced the following binary tree graph.
Data Visualizations: Path One
Group 4: Media
Group 3: RevolutionariesGroup 1: Intellectuals
Group 2: Renaissance Artists
Recommendations for DataPath One: Collection
Use of a more sophisticated API (i.e. Twitter Firehouse) that will allow a greater ingestion bandwidth and use of criteria (keywords, locations, users, etc.)
Supplement the path by setting up user polls to get a better idea of how the general public feels.
Path One: Assessment
More data would help probabilistic assessment with statistical significance.
Continue sentiment analysis using Naive Bayes Classifier using the methodology outlined within Gamallo & Garcia (2014).
Recommendations for AnalysisPath Two: Collection
Employment of unstructured data using internet scrapers to pull data from various national, regional, and local news stations.
Development of national databases combining the efforts of organizations who sponsor the user submitted files submitted.
Path Two: Assessment
Assessment of the hypothesis that contributed to the best statistical analysis as opposed to analysis of all hypothesis concurrently.
Focus on the assessment of the dataset yielding the best analysis for the number of variables included.