Skip to main content

Comments (2)

@verge
> complain about unlicensed data being used to train AI models
> only cite Reddit posts
> workers openly speak of sabotaging the data when being paid to create licensed data for training AI models

My professional opinion is that this is either being used as evaluation data, or as a data source to claim fully licensed data ("look, we have all this data we used for training"). They wouldn't be able to hire enough people to create the kind of corpus needed.

@verge "hey come help us try to put you out a job"