Skip to content

Commit 5547ad8

Browse files
Re-Modifications (#5)
* Add files via upload Have added Exercises and solutions for the language processing Pipeline(Video-9) tutorial. * Add files via upload Have added the exercises and solutions for the Stemming and Lemmatization (Video-10) Tutorial. * Add files via upload have added exercises and solutions for NER (Video-12) Tutorial. * Delete Language_Processing_exercise.ipynb * Delete Language_Processing_exercise_solutions.ipynb * Delete Stemming_and_Lemmatization_Exercise.ipynb * Delete Stemming_and_Lemmatization_Solutions.ipynb * Delete Named_Entity_Recognition_Exercise.ipynb * Delete Named_Entity_Recognition_Solutions.ipynb * Add files via upload * Add files via upload * Add files via upload
1 parent 1dc8101 commit 5547ad8

6 files changed

+1123
-0
lines changed
Lines changed: 167 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,167 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"metadata": {
6+
"id": "Yrci22GYhTQP"
7+
},
8+
"source": [
9+
"### **Spacy Language Processing Pipelines: Exercises**"
10+
]
11+
},
12+
{
13+
"cell_type": "code",
14+
"execution_count": null,
15+
"metadata": {
16+
"id": "YUMPkcohhgam"
17+
},
18+
"outputs": [],
19+
"source": [
20+
"#importing necessary libraries \n",
21+
"import spacy\n",
22+
"\n",
23+
"nlp = spacy.load(\"en_core_web_sm\") #creating an object and loading the pre-trained model for \"English\""
24+
]
25+
},
26+
{
27+
"cell_type": "markdown",
28+
"metadata": {
29+
"id": "hxtliEGIh4gS"
30+
},
31+
"source": [
32+
"#### **Excersie: 1**\n",
33+
"\n",
34+
"- Get all the proper nouns from a given text in a list and also count how many of them.\n",
35+
"- **Proper Noun** means a noun that names a particular person, place, or thing."
36+
]
37+
},
38+
{
39+
"cell_type": "code",
40+
"execution_count": null,
41+
"metadata": {
42+
"colab": {
43+
"base_uri": "https://localhost:8080/"
44+
},
45+
"id": "lRGfbeEshFf-",
46+
"outputId": "f8d6beed-c03a-479c-b7bd-4a21173aba55"
47+
},
48+
"outputs": [],
49+
"source": [
50+
"text = '''Ravi and Raju are the best friends from school days.They wanted to go for a world tour and \n",
51+
"visit famous cities like Paris, London, Dubai, Rome etc and also they called their another friend Mohan to take part of this world tour.\n",
52+
"They started their journey from Hyderabad and spent next 3 months travelling all the wonderful cities in the world and cherish a happy moments!\n",
53+
"'''\n",
54+
"\n",
55+
"# https://spacy.io/usage/linguistic-features\n",
56+
"\n",
57+
"#creating the nlp object\n",
58+
"doc = nlp(text) \n",
59+
"\n",
60+
"\n",
61+
"\n",
62+
"\n"
63+
]
64+
},
65+
{
66+
"cell_type": "markdown",
67+
"metadata": {
68+
"id": "WfU6CRIWhFh8"
69+
},
70+
"source": [
71+
"**Expected Output**\n",
72+
"\n",
73+
"Proper Nouns: [Ravi, Raju, Paris, London, Dubai, Rome, Mohan, Hyderabad]\n",
74+
"\n",
75+
"Count: 8\n"
76+
]
77+
},
78+
{
79+
"cell_type": "markdown",
80+
"metadata": {
81+
"id": "FUr2rnbYmdlv"
82+
},
83+
"source": [
84+
"#### **Excersie: 2**\n",
85+
"\n",
86+
"- Get all companies names from a given text and also the count of them.\n",
87+
"- **Hint**: Use the spacy **ner** functionality "
88+
]
89+
},
90+
{
91+
"cell_type": "code",
92+
"execution_count": null,
93+
"metadata": {
94+
"colab": {
95+
"base_uri": "https://localhost:8080/"
96+
},
97+
"id": "LLf4xyGEmZ2P",
98+
"outputId": "e9582d9f-4f1e-4574-e3d8-a5526a4fb6cb"
99+
},
100+
"outputs": [],
101+
"source": [
102+
"text = '''The Top 5 companies in USA are Tesla, Walmart, Amazon, Microsoft, Google and the top 5 companies in \n",
103+
"India are Infosys, Reliance, HDFC Bank, Hindustan Unilever and Bharti Airtel'''\n",
104+
"\n",
105+
"\n",
106+
"doc = nlp(text)\n",
107+
"\n",
108+
"\n",
109+
"\n",
110+
"\n",
111+
"\n",
112+
"\n",
113+
"\n",
114+
"\n"
115+
]
116+
},
117+
{
118+
"cell_type": "markdown",
119+
"metadata": {
120+
"id": "4JK5eMsCmZ5i"
121+
},
122+
"source": [
123+
"**Expected Output**\n",
124+
"\n",
125+
"\n",
126+
"Company Names: [Tesla, Walmart, Amazon, Microsoft, Google, Infosys, Reliance, HDFC Bank, Hindustan Unilever, Bharti Airtel]\n",
127+
"\n",
128+
"Count: 10"
129+
]
130+
},
131+
{
132+
"cell_type": "markdown",
133+
"metadata": {
134+
"id": "HkbNaNVChFoB"
135+
},
136+
"source": [
137+
"## [**Solution**](./language_processing_exercise_solutions.ipynb)"
138+
]
139+
}
140+
],
141+
"metadata": {
142+
"colab": {
143+
"collapsed_sections": [],
144+
"name": "Language Processing_exercise.ipynb",
145+
"provenance": []
146+
},
147+
"kernelspec": {
148+
"display_name": "Python 3 (ipykernel)",
149+
"language": "python",
150+
"name": "python3"
151+
},
152+
"language_info": {
153+
"codemirror_mode": {
154+
"name": "ipython",
155+
"version": 3
156+
},
157+
"file_extension": ".py",
158+
"mimetype": "text/x-python",
159+
"name": "python",
160+
"nbconvert_exporter": "python",
161+
"pygments_lexer": "ipython3",
162+
"version": "3.8.10"
163+
}
164+
},
165+
"nbformat": 4,
166+
"nbformat_minor": 1
167+
}
Lines changed: 164 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,164 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"metadata": {
6+
"id": "Yrci22GYhTQP"
7+
},
8+
"source": [
9+
"### **Spacy Language Processing Pipelines: Solutions**"
10+
]
11+
},
12+
{
13+
"cell_type": "code",
14+
"execution_count": 1,
15+
"metadata": {
16+
"id": "YUMPkcohhgam"
17+
},
18+
"outputs": [],
19+
"source": [
20+
"#importing necessary libraries \n",
21+
"import spacy\n",
22+
"\n",
23+
"nlp = spacy.load(\"en_core_web_sm\") #creating an object and loading the pre-trained model for \"English\""
24+
]
25+
},
26+
{
27+
"cell_type": "markdown",
28+
"metadata": {
29+
"id": "hxtliEGIh4gS"
30+
},
31+
"source": [
32+
"#### **Excersie: 1**\n",
33+
"\n",
34+
"- Get all the proper nouns from a given text in a list and also count how many of them.\n",
35+
"- **Proper Noun** means a noun that names a particular person, place, or thing."
36+
]
37+
},
38+
{
39+
"cell_type": "code",
40+
"execution_count": 2,
41+
"metadata": {
42+
"colab": {
43+
"base_uri": "https://localhost:8080/"
44+
},
45+
"id": "lRGfbeEshFf-",
46+
"outputId": "f8d6beed-c03a-479c-b7bd-4a21173aba55"
47+
},
48+
"outputs": [
49+
{
50+
"name": "stdout",
51+
"output_type": "stream",
52+
"text": [
53+
"Proper Nouns: [Ravi, Raju, Paris, London, Dubai, Rome, Mohan, Hyderabad]\n",
54+
"Count: 8\n"
55+
]
56+
}
57+
],
58+
"source": [
59+
"text = '''Ravi and Raju are the best friends from school days.They wanted to go for a world tour and \n",
60+
"visit famous cities like Paris, London, Dubai, Rome etc and also they called their another friend Mohan to take part of this world tour.\n",
61+
"They started their journey from Hyderabad and spent next 3 months travelling all the wonderful cities in the world and cherish a happy moments!\n",
62+
"'''\n",
63+
"\n",
64+
"# https://spacy.io/usage/linguistic-features\n",
65+
"\n",
66+
"#creating the nlp object\n",
67+
"doc = nlp(text) \n",
68+
"\n",
69+
"\n",
70+
"#list for storing the proper nouns\n",
71+
"all_proper_nouns = [] \n",
72+
"\n",
73+
"\n",
74+
"for token in doc:\n",
75+
" if token.pos_ == \"PROPN\": #checking the whether token belongs to parts of speech \"PROPN\" [Proper Noun]\n",
76+
" all_proper_nouns.append(token)\n",
77+
" \n",
78+
"\n",
79+
"#finally printing the results\n",
80+
"print(\"Proper Nouns: \", all_proper_nouns)\n",
81+
"print(\"Count: \", len(all_proper_nouns))"
82+
]
83+
},
84+
{
85+
"cell_type": "markdown",
86+
"metadata": {
87+
"id": "FUr2rnbYmdlv"
88+
},
89+
"source": [
90+
"#### **Excersie: 2**\n",
91+
"\n",
92+
"- Get all companies names from a given text and also the count of them.\n",
93+
"- **Hint**: Use the spacy **ner** functionality "
94+
]
95+
},
96+
{
97+
"cell_type": "code",
98+
"execution_count": 24,
99+
"metadata": {
100+
"colab": {
101+
"base_uri": "https://localhost:8080/"
102+
},
103+
"id": "LLf4xyGEmZ2P",
104+
"outputId": "e9582d9f-4f1e-4574-e3d8-a5526a4fb6cb"
105+
},
106+
"outputs": [
107+
{
108+
"name": "stdout",
109+
"output_type": "stream",
110+
"text": [
111+
"Company Names: [Tesla, Walmart, Amazon, Microsoft, Google, Infosys, Reliance, HDFC Bank, Hindustan Unilever, Bharti Airtel]\n",
112+
"Count: 10\n"
113+
]
114+
}
115+
],
116+
"source": [
117+
"text = '''The Top 5 companies in USA are Tesla, Walmart, Amazon, Microsoft, Google and the top 5 companies in \n",
118+
"India are Infosys, Reliance, HDFC Bank, Hindustan Unilever and Bharti Airtel'''\n",
119+
"\n",
120+
"\n",
121+
"doc = nlp(text)\n",
122+
"\n",
123+
"#list for storing the company names\n",
124+
"all_company_names = []\n",
125+
"\n",
126+
"for ent in doc.ents:\n",
127+
" if ent.label_ == 'ORG': #checking the whether token belongs to entity \"ORG\" [Organisation]\n",
128+
" all_company_names.append(ent)\n",
129+
"\n",
130+
"\n",
131+
"\n",
132+
"#finally printing the results\n",
133+
"print(\"Company Names: \", all_company_names)\n",
134+
"print(\"Count: \", len(all_company_names))"
135+
]
136+
}
137+
],
138+
"metadata": {
139+
"colab": {
140+
"collapsed_sections": [],
141+
"name": "Language Processing_exercise.ipynb",
142+
"provenance": []
143+
},
144+
"kernelspec": {
145+
"display_name": "Python 3 (ipykernel)",
146+
"language": "python",
147+
"name": "python3"
148+
},
149+
"language_info": {
150+
"codemirror_mode": {
151+
"name": "ipython",
152+
"version": 3
153+
},
154+
"file_extension": ".py",
155+
"mimetype": "text/x-python",
156+
"name": "python",
157+
"nbconvert_exporter": "python",
158+
"pygments_lexer": "ipython3",
159+
"version": "3.8.10"
160+
}
161+
},
162+
"nbformat": 4,
163+
"nbformat_minor": 1
164+
}

0 commit comments

Comments
 (0)