We’ve said that the Prosocial Ranking Challenge will address three areas: conflict, information, and well-being. This means we will be measuring outcomes in these three categories, mostly using surveys at 0, 2, and 4 months. We will also measure short term outcomes using very brief in-feed surveys (1-3 questions), and a number of exposure and engagement outcomes.
We can’t measure absolutely everything we’d like, because there is a limit to how many questions people will answer. We are still conferring with our scientific advisors (these fine people) about exactly which variables to test. There may still be changes to the list below.
Note that you don’t have to try to move all of these measures. These are all options for you. A ranker that is designed to move a single one of these outcomes may win, if it’s interesting enough (and you can put together a good argument that it will work). But we will test all rankers against all outcomes, and many of these are going to be correlated, so you may want to think about the set of measures you expect to change.
Types of DVs
In general, we are interested in four different categories of outcomes
Conflict and polarization
Well-being and addiction
Information and misinformation
Viewing and posting behavior
We are measuring these outcomes in several different ways
Surveys at 0, 2, and 4 months using Qualtrics
In-feed surveys. These look like posts but ask 1-2 quick questions. We will ask at most weekly.
Exposure: what participants saw in their feeds.
Behavioral outcomes, from recorded clicks, likes, shares, etc. and what they posted.
In some cases we are also measuring primary vs. secondary outcomes. We will pre-register predictions of the effects on our primary outcomes and control for multiple comparisons. For secondary outcomes, we will not pre-register or correct for multiple comparisons, which will make these results more exploratory in nature.
Add your own DVs
Is there something else you want to study? We will consider adding up to three survey questions for each team. These may be long-term or in-feed questions. This is not guaranteed — please make the case in your submission as to why there is good science to be done.
Behavioral or exposure outcomes, by contrast, do not take a toll on the user. We are very likely to consider collecting additional outcomes that we can derive from data we are already collecting.
All rankers will be tested on all outcomes, including any that are added by other teams.
Survey Outcomes
Conflict
Primary
Affective Polarization feeling thermometer toward political ingroup and outgroup (ANES, 2 questions)
Please indicate how you feel toward [inparty/outparty members] using the scale below. 100 means that you feel very favorably or warm toward them, 0 that you feel very unfavorable or cold, and 50 are neutral.
Meta-perceptions (Mernyck et al. 2022, 5 questions)
“How much do you feel it is justified for [own party] to use violence in advancing their political goals these days?
“When, if ever, is it OK for [own party] to send threatening and intimidating messages to [opposing party] leaders?”
What answer would the average [in party member / out party member] give for the above questions?
“Think about the coming presidential election. If [in-partisan] is declared the winner of a contested election, how likely do you think [out-partisan] voters would be to engage in violence?”
Index (from Sirin et al. 2016, 2 questions)
"I find it difficult to see things from Democrats’/Republicans’ point of view”
“I think it is important to consider the perspective of Democrats/Republicans”
Secondary (no pre-registered hypothesis, 2 questions)
Social trust (GSS)
"Generally speaking, would you say that most people can be trusted, or that you can't be too careful in dealing with people?"
Outparty friends (Druckman, Levendusky 2019, Rajadesingan et al, 2023)
"How comfortable are you having close personal friends who are [Outparty]?"
Well-being
Primary
WHO-5 well-being index, prefixed with “in the last two weeks” (5 questions)
“I have felt cheerful and in good spirits”
“I have felt calm and relaxed”
“I have felt active and vigorous”
“I woke up feeling fresh and rested”
“My daily life has been filled with things that interest me”
Neely Center Social Media Index (2 of 4 from this) prefixed with “in the last two weeks,” ask per platform, then combine
“Have you experienced a meaningful connection with others on [platform]?”
“Have you personally witnessed or experienced something that affected you negatively on [platform]?”
Information
Primary
News knowledge quiz (ala Allcott, 10 questions).
Each wave of the survey will use a different set of 10 recent headlines, some of which will be real and some of which will be made up. The survey tests whether participants know which headlines are real.
Neely Center Social Media Index (2 of 4 from this) prefixed with “last two weeks”, ask per platform, then combine
“Have you learned something that was useful or helped you understand something important on [platform]?”
“Have you witnessed or experienced content that you would consider bad for the world on [platform]?”
In-feed surveys
Each ranker will specify whether it is targeting primarily conflict, information, or well-being outcomes. We will ask only the questions in that category, distributing them so no more than three are displayed at once, and no more than once per week.
Half of the people in each treatment will receive the in-feed surveys so that we can assess experimenter demand effects. The other half will receive the survey in a randomized position in the first 20 items of their feed.
Engagement & Platform Usage
Primary (all per platform, then combined)
Total number of posts seen
Total time spent
Rate of engagement types (clicks, likes, shares, comments etc.)
overall
politics/civic
news
Engagement with toxicity
Average toxicity (Jigsaw, Dignity Index) of posts engaged with
Odds ratio, engage with toxic vs nontoxic (binarized Jigsaw, Dignity Index)
Average toxicity of posts created
Toward ingroup vs. outgroup members
Secondary Outcomes (all per-platform, then combined)
Retention (percentage of users who used platform at least once in month 4/4, correcting for study dropout).
Between subjects vs. control group
Within-subjects vs. baseline period.
Distribution of scroll depth (posts seen per session)
Engagements (clicks etc.) broken out by content type
politics/civic, news
misinformation domain
# of Ads seen
Time spent on other social media platforms (via URL tracking)
Percentage of mobile social media usage
Feed changes
These should move mechanically due to ranker operation. Used as descriptive statistics / manipulation tests / intermediate variables.
Measurement of feed change
Per-session number of items added, deleted
Measures of difference between lists input to and output from ranker (e.g. RBO)
Posts seen and post served (not necessarily seen due to scroll depth), by type
political/civic
news content (defined as from news domains)
toxic (Jigsaw and DI scores, binarized)
Misinformation (based on database of domains from Lin et al)
Distribution over posts seen and posts served
Information quality (using MBFC “factual” ratings ala Freelon 2024)
Toxicity (Jigsaw, Dignity)