currently dataset can be divided into 3 classes
language knowledge
dialogue : don't let user know you are a robot
STEM : knowledge about the world
world knowledge <= ideally we want to handle this via prefix context
Issues and TODO:
as dataset are growing, how can we update this section less
ideally we can update the config yaml and new dataset will be download from hub
- one possible idea is we upload the transform format of these dataset to the OA hub