This python code can be used to generate topics, questions, and answers from a paragraph of text. This is a good way to generate ground truth knowledge about a topic from a trusted source.
- Topic: A word that describes the topic of the paragraph, such as Biology or Stem Cells.
- Prefix: An introductory phrase that adds context to a question, such as "Speaking of stem cells,"
- Open Book Answer: An answer to a question that was generated using the provided paragraph as guidance.
- Closed Book Answer: An answer to a question that was generated without the use of the provided paragraph.
- Formatted Answer: An adjusted answer that expresses certainty in an answer based on the answer's confidence.
- Confidence: A score between 0 and 1 that is calculated by measuring the similarity between the given closed book answer and the open book answer.
The output of this is a dictionary with the following information:
- Submitted paragraph
- Sample topics
- Sample questions
- Sample answers
- Generated topics
- Generated questions
- Generated prefixes
- Generated open book answer
- Generated closed book answer
- Generated closed book answer with generated prefix as context
- Formatted generated closed book answer
- Formatted generated closed book answer with generated prefix as context
This code is verified to work on a 24GB vram graphics card (like an RTX3090). We are working on getting it to run on Google Colab TPUs, and also it may be possible to use smaller T5 models like the 3 billion parameter model and still get acceptable results.