forked from akaiketech/internship-assignment-nlp
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathReport.txt
28 lines (20 loc) · 1.69 KB
/
Report.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
Extraction of PDF:
In this project the process of extracting the PDF is done with the help of the PyPDF module in python to enable easy access I have given it as a function as extract_text_from_pdf
where we can give the PDF path and extract the text from it and store it in a variable context.
MCA questions:
The context generated in the PDF is passed to this function where it will accept only the string data.
The generated context is now set a series of preprocessing techniques as the data extracted is need to be cleaned.
->keyword Extraction
after the preprocessing the text is passed to a model called KeyBERT which is used to extract the keywords from the context provided
this keyword is stored in a variable KeyBERT_ans
->Question Generation:
using this keywords the questions are generated using the T5 transformer model.
as the time was limited I have to go for pretrained models so I have used T5 model for Conditional Generation trained on a squad Dataset by ramshrigoutham
link=https://huggingface.co/ramsrigouthamg/t5_squad
(thought of various methods like LSTM and BART classification but due to time constraint I'm using a pretrained model.)
as per the number of keywords generated the get_question function generates the questions and stores it in the ques variable
->Option Generation
the options are generated using the wordnet->synsets and -> hyponym of the keyword and the keyword itself is given as a option
these are appended together to give a total of 4 options
->Now, the questions and options are zipped together and appended into a list so that the get_mca_questions function could return a list of questions and options
even now it generates questions lot of fine tuning work has to be done