Data Sets

;Data is the new currency of our day and age and is being utilized to train sophisticated models for prediction and classification tasks. These models are revolutionizing many domains including but not limited to healthcare, technology, and manufacturing. A model is often as good as its data, as a result data should be thoroughly cleaned and formatted before it is analyzed. Information for popular datasets is sometimes sparse and better documentation is necessary. To sponsor the growth of Machine Learning and streamline the training process we have complied various datasets and information pertaining to them.

Please check back often as we add to this list.

NEW DATA SETS–

1. Measuring Massive Multitask Language Understanding; data set containing OpenAI API evaluation code;

2. Natural Adversarial Examples; data set of real-world, unmodified naturally occurring examples that cause ML model performance degradation.

3. Measuring Coding Challenge Competence with APPS; a repository containing evaluation code.

4. Measuring Mathematical Problem Solving with the MATH dataset; repository containing dataset loaders and evaluation code.

5. Aligning AI with Shared Human Values; repository, folders contain fine-tuning scripts for individual tasks of the ETHICS benchmark. There is also an interactive script to probe a commonsense morality model and a utilitarianism model.

6. Forecasting Future World Events with Neural Networks Forecasts of climate, geopolitical conflict, pandemics and economic indicators help shape policy and decision making.

8. Natural language descriptions of distribution shifts

9. Combined Anomalous Object Segmentation (CAOS)

10. ImageNet-R(endition) and DeepAugment

11. Anomalous models for auditing visualization methods

12. Reward hacking environments

13. Measuring moral behavior in reinforcement learning agents

14. Using data augmentation to improve robustness