Skip to content

Latest commit

 

History

History

custom_datasets

Dataset collections overview:

currently dataset can be divided into 3 classes

  • language knowledge

    • summarization

    • translation

  • dialogue : don't let user know you are a robot

  • STEM : knowledge about the world

    • code

    • world knowledge <= ideally we want to handle this via prefix context

  • qa

Issues and TODO:

  • as dataset are growing, how can we update this section less

  • ideally we can update the config yaml and new dataset will be download from hub

    • one possible idea is we upload the transform format of these dataset to the OA hub