Skip to content

Commit 6b19e9f

Browse files
authored
CU-8694w2cmw: Add example of k-fold metrics (#25)
1 parent 2282a67 commit 6b19e9f

File tree

2 files changed

+64
-18
lines changed

2 files changed

+64
-18
lines changed

notebooks/introductory/Part_4_2_Supervised_Training_and_Meta_annotations.html

Lines changed: 43 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -13859,10 +13859,10 @@ <h2 id="Fine-tuning-the-NER+L&#160;model">Fine-tuning the NER+L&#160;model<a cla
1385913859

1386013860

1386113861

13862-
<div id="f7f355c5-69e7-4db0-b862-0108bbf9a1a5"></div>
13862+
<div id="ceac96c5-9657-4da0-8345-7fa2de57788b"></div>
1386313863
<div class="output_subarea output_widget_view ">
1386413864
<script type="text/javascript">
13865-
var element = $('#f7f355c5-69e7-4db0-b862-0108bbf9a1a5');
13865+
var element = $('#ceac96c5-9657-4da0-8345-7fa2de57788b');
1386613866
</script>
1386713867
<script type="application/vnd.jupyter.widget-view+json">
1386813868
{"model_id": "6fd10f1692234019836a7b40e83b56dd", "version_major": 2, "version_minor": 0}
@@ -13881,10 +13881,10 @@ <h2 id="Fine-tuning-the-NER+L&#160;model">Fine-tuning the NER+L&#160;model<a cla
1388113881

1388213882

1388313883

13884-
<div id="1cd864ed-2f26-414a-9e7b-cca661359203"></div>
13884+
<div id="92e06761-c3ed-4ca7-80a8-f1e70848b7f6"></div>
1388513885
<div class="output_subarea output_widget_view ">
1388613886
<script type="text/javascript">
13887-
var element = $('#1cd864ed-2f26-414a-9e7b-cca661359203');
13887+
var element = $('#92e06761-c3ed-4ca7-80a8-f1e70848b7f6');
1388813888
</script>
1388913889
<script type="application/vnd.jupyter.widget-view+json">
1389013890
{"model_id": "9a5ab9cfecc242b7aaf0f140e87bdde6", "version_major": 2, "version_minor": 0}
@@ -13963,10 +13963,10 @@ <h2 id="Fine-tuning-the-NER+L&#160;model">Fine-tuning the NER+L&#160;model<a cla
1396313963

1396413964

1396513965

13966-
<div id="6960f9a5-5688-4564-873d-9adbd34be108"></div>
13966+
<div id="894a81fa-daca-4461-b9cf-8c8b2f318695"></div>
1396713967
<div class="output_subarea output_widget_view ">
1396813968
<script type="text/javascript">
13969-
var element = $('#6960f9a5-5688-4564-873d-9adbd34be108');
13969+
var element = $('#894a81fa-daca-4461-b9cf-8c8b2f318695');
1397013970
</script>
1397113971
<script type="application/vnd.jupyter.widget-view+json">
1397213972
{"model_id": "434496e448984f55925d22fad0349ada", "version_major": 2, "version_minor": 0}
@@ -13985,10 +13985,10 @@ <h2 id="Fine-tuning-the-NER+L&#160;model">Fine-tuning the NER+L&#160;model<a cla
1398513985

1398613986

1398713987

13988-
<div id="63d7255b-e667-4bea-af37-72e5372a0883"></div>
13988+
<div id="f0c0c808-7ff5-4702-83c5-2526a1f39a68"></div>
1398913989
<div class="output_subarea output_widget_view ">
1399013990
<script type="text/javascript">
13991-
var element = $('#63d7255b-e667-4bea-af37-72e5372a0883');
13991+
var element = $('#f0c0c808-7ff5-4702-83c5-2526a1f39a68');
1399213992
</script>
1399313993
<script type="application/vnd.jupyter.widget-view+json">
1399413994
{"model_id": "f7d1803b3c6c4197b6612c5fdf189746", "version_major": 2, "version_minor": 0}
@@ -14007,10 +14007,10 @@ <h2 id="Fine-tuning-the-NER+L&#160;model">Fine-tuning the NER+L&#160;model<a cla
1400714007

1400814008

1400914009

14010-
<div id="dd55a253-1bc0-4801-a02f-de6f7145ad2f"></div>
14010+
<div id="40209f7f-f501-410b-b86c-1dff1f4e15e8"></div>
1401114011
<div class="output_subarea output_widget_view ">
1401214012
<script type="text/javascript">
14013-
var element = $('#dd55a253-1bc0-4801-a02f-de6f7145ad2f');
14013+
var element = $('#40209f7f-f501-410b-b86c-1dff1f4e15e8');
1401414014
</script>
1401514015
<script type="application/vnd.jupyter.widget-view+json">
1401614016
{"model_id": "c8d633f579de438a916d9ef3de9d8fe0", "version_major": 2, "version_minor": 0}
@@ -14029,10 +14029,10 @@ <h2 id="Fine-tuning-the-NER+L&#160;model">Fine-tuning the NER+L&#160;model<a cla
1402914029

1403014030

1403114031

14032-
<div id="8dcdb0a2-c5bc-47ba-b4d8-98752ad7d19c"></div>
14032+
<div id="ba9f83fa-677a-4478-977a-84f6680e1016"></div>
1403314033
<div class="output_subarea output_widget_view ">
1403414034
<script type="text/javascript">
14035-
var element = $('#8dcdb0a2-c5bc-47ba-b4d8-98752ad7d19c');
14035+
var element = $('#ba9f83fa-677a-4478-977a-84f6680e1016');
1403614036
</script>
1403714037
<script type="application/vnd.jupyter.widget-view+json">
1403814038
{"model_id": "de6c01c6983041e2b972f6008caefaea", "version_major": 2, "version_minor": 0}
@@ -14051,10 +14051,10 @@ <h2 id="Fine-tuning-the-NER+L&#160;model">Fine-tuning the NER+L&#160;model<a cla
1405114051

1405214052

1405314053

14054-
<div id="276a076a-ba10-46e9-bfe6-4b148d823c15"></div>
14054+
<div id="2927e9a1-6999-48b3-bfe3-9cb426add119"></div>
1405514055
<div class="output_subarea output_widget_view ">
1405614056
<script type="text/javascript">
14057-
var element = $('#276a076a-ba10-46e9-bfe6-4b148d823c15');
14057+
var element = $('#2927e9a1-6999-48b3-bfe3-9cb426add119');
1405814058
</script>
1405914059
<script type="application/vnd.jupyter.widget-view+json">
1406014060
{"model_id": "05132c907a874fe2a2eb9cb6c81da3b3", "version_major": 2, "version_minor": 0}
@@ -17502,10 +17502,10 @@ <h2 id="Fine-tuning-the-NER+L&#160;model">Fine-tuning the NER+L&#160;model<a cla
1750217502

1750317503

1750417504

17505-
<div id="0345720f-a2f0-4660-a340-9c6e7ef44710"></div>
17505+
<div id="4471f788-129e-42f5-b40b-fe386231b101"></div>
1750617506
<div class="output_subarea output_widget_view ">
1750717507
<script type="text/javascript">
17508-
var element = $('#0345720f-a2f0-4660-a340-9c6e7ef44710');
17508+
var element = $('#4471f788-129e-42f5-b40b-fe386231b101');
1750917509
</script>
1751017510
<script type="application/vnd.jupyter.widget-view+json">
1751117511
{"model_id": "00325922360c45009329d82ed6420f16", "version_major": 2, "version_minor": 0}
@@ -17524,10 +17524,10 @@ <h2 id="Fine-tuning-the-NER+L&#160;model">Fine-tuning the NER+L&#160;model<a cla
1752417524

1752517525

1752617526

17527-
<div id="293c269d-d3e1-472b-8a2d-c983f9bd3529"></div>
17527+
<div id="ea307193-bdef-4dd8-970e-5fca075d9c90"></div>
1752817528
<div class="output_subarea output_widget_view ">
1752917529
<script type="text/javascript">
17530-
var element = $('#293c269d-d3e1-472b-8a2d-c983f9bd3529');
17530+
var element = $('#ea307193-bdef-4dd8-970e-5fca075d9c90');
1753117531
</script>
1753217532
<script type="application/vnd.jupyter.widget-view+json">
1753317533
{"model_id": "d48e2f4d6dd3467fb3f17e0244b0e361", "version_major": 2, "version_minor": 0}
@@ -17599,6 +17599,31 @@ <h2 id="Fine-tuning-the-NER+L&#160;model">Fine-tuning the NER+L&#160;model<a cla
1759917599
</div>
1760017600
</div>
1760117601

17602+
</div>
17603+
<div class="cell border-box-sizing text_cell rendered"><div class="prompt input_prompt">
17604+
</div><div class="inner_cell">
17605+
<div class="text_cell_render border-box-sizing rendered_html">
17606+
<h4 id="K-fold-metrics">K-fold metrics<a class="anchor-link" href="#K-fold-metrics">&#182;</a></h4><p>K-fold cross-validation offers a more robust evaluation of your model's performance by dividing your dataset into k subsets, or folds.
17607+
Unlike a single evaluation on the entire dataset (like <code>cat._print_stats</code>), the k-fold approach ensures that every data point is used for both training and validation, thereby reducing the risk of bias and providing a more reliable estimate of the model's generalization capabilities.
17608+
This method is particularly beneficial for assessing the fine-tuned performance of your model on specific datasets, as it accounts for variability and offers a comprehensive understanding of how the model might perform on unseen data.</p>
17609+
17610+
</div>
17611+
</div>
17612+
</div>
17613+
<div class="cell border-box-sizing code_cell rendered">
17614+
<div class="input">
17615+
<div class="prompt input_prompt">In&nbsp;[&nbsp;]:</div>
17616+
<div class="inner_cell">
17617+
<div class="input_area">
17618+
<div class=" highlight hl-ipython3"><pre><span></span><span class="c1"># you need to import the module to use it</span>
17619+
<span class="kn">from</span> <span class="nn">medcat.stats.kfold</span> <span class="kn">import</span> <span class="n">get_k_fold_stats</span>
17620+
<span class="n">fps</span><span class="p">,</span> <span class="n">fns</span><span class="p">,</span> <span class="n">tps</span><span class="p">,</span> <span class="n">cui_prec</span><span class="p">,</span> <span class="n">cui_rec</span><span class="p">,</span> <span class="n">cui_f1</span><span class="p">,</span> <span class="n">cui_counts</span><span class="p">,</span> <span class="n">examples</span> <span class="o">=</span> <span class="n">get_k_fold_stats</span><span class="p">(</span><span class="n">cat</span><span class="p">,</span> <span class="n">data</span><span class="p">)</span>
17621+
</pre></div>
17622+
17623+
</div>
17624+
</div>
17625+
</div>
17626+
1760217627
</div>
1760317628
<div class="cell border-box-sizing code_cell rendered">
1760417629
<div class="input">

notebooks/introductory/Part_4_2_Supervised_Training_and_Meta_annotations.ipynb

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4487,6 +4487,27 @@
44874487
"fps, fns, tps, cui_prec, cui_rec, cui_f1, cui_counts, examples = cat._print_stats(data, extra_cui_filter=True)"
44884488
]
44894489
},
4490+
{
4491+
"cell_type": "markdown",
4492+
"metadata": {},
4493+
"source": [
4494+
"#### K-fold metrics\n",
4495+
"K-fold cross-validation offers a more robust evaluation of your model's performance by dividing your dataset into k subsets, or folds.\n",
4496+
"Unlike a single evaluation on the entire dataset (like `cat._print_stats`), the k-fold approach ensures that every data point is used for both training and validation, thereby reducing the risk of bias and providing a more reliable estimate of the model's generalization capabilities.\n",
4497+
"This method is particularly beneficial for assessing the fine-tuned performance of your model on specific datasets, as it accounts for variability and offers a comprehensive understanding of how the model might perform on unseen data."
4498+
]
4499+
},
4500+
{
4501+
"cell_type": "code",
4502+
"execution_count": null,
4503+
"metadata": {},
4504+
"outputs": [],
4505+
"source": [
4506+
"# you need to import the module to use it\n",
4507+
"from medcat.stats.kfold import get_k_fold_stats\n",
4508+
"fps, fns, tps, cui_prec, cui_rec, cui_f1, cui_counts, examples = get_k_fold_stats(cat, data)"
4509+
]
4510+
},
44904511
{
44914512
"cell_type": "code",
44924513
"execution_count": 13,

0 commit comments

Comments
 (0)