At last! This repo contains code and data to generate as much data as you want to test your ranker. You can do something like
python pull_data.py -p twitter -numposts 50 -randomseed 999
to get a random (yet deterministic based on the seed) set of tweets, written to stdout. You can ask for “facebook” and “reddit” similarly.
It’s hard to find good public data sets of social media posts, and what’s out there doesn’t really represent what someone would see in their feed. What we could find is mostly scrapes of high profile public account posts and so forth. But at least this program will generate correctly formatted JSON of the sort that will be given to your ranker.