The Measuring Hate Speech project aims to develop models, datasets, and theoretical frameworks capable of measuring hate speech.
The majority of work in automated hate speech detection treats hate speech as a binary phenomenon: a piece of text is either hate speech or is not. This limited perspective does not account for the multifaceted nature of hate speech or for disagreements among individuals as to what constitutes hate speech.
Using Rasch measurement theory, we have developed a continuous measurement scale for hate speech, capable of accommodating annotator perspective. By combining the measurement scale with large language models, we have developed tools that can measure the hatefulness of text at scale.
The Measuring Hate Speech project began in early 2017 at UC Berkeley’s D-Lab. We continue our work on this project, both in academic research and deploying our expertise, tools, and methods to those who could benefit from them via consultations. If you are interested in partnering with us, please reach out using the contact form at the bottom of the website.
Our datasets and models are freely and openly available on HuggingFace.
The Measuring Hate Speech dataset contains nearly 50,000 annotations, with over 10,000 unique social media comments (from YouTube, Reddit, and Twitter). These annotations contain responses to 10 survey items which span our hate speech construct, information about the targets of the speech, and annotator demographics.
Our neural network models are also available on HuggingFace. The code for these models is available on GitHub. The code to reproduce the work in our papers are also available on GitHub (see below).