This python code can be used to generate topics, questions, and answers from a paragraph of text. This is a good way to generate ground truth knowledge about a topic from a trusted source.
The output of this is a dictionary with the following information:
- submitted paragraph
- generated topics
- generated questions
- generated topic prefixes that can be prepended to the questions
- open book answer based only on the provided paragraph
- closed book answers generated by FLAN-T5-11B (uses only question and optionally question prefix to generate the answer)
This code is verified to work on a 24GB vram graphics card (like an RTX3090). We are working on getting it to run on google colab TPUs and also it may be possible to use smaller T5 models like the 3 billion parameter model and still get acceptable results.