CogStack
diff --git a/‎.github/workflows/main.yml‎
Lines changed: 2 additions & 2 deletions b/‎.github/workflows/main.yml‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎README.md‎
Lines changed: 1 addition & 0 deletions b/‎README.md‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎notebooks/introductory/Part_1_1_OPTIONAL_Logging_With_MedCAT.html‎
Lines changed: 1 addition & 1 deletion b/‎notebooks/introductory/Part_1_1_OPTIONAL_Logging_With_MedCAT.html‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎notebooks/introductory/Part_1_1_OPTIONAL_Logging_With_MedCAT.ipynb‎
Lines changed: 9 additions & 1 deletion b/‎notebooks/introductory/Part_1_1_OPTIONAL_Logging_With_MedCAT.ipynb‎
Lines changed: 9 additions & 1 deletion
diff --git a/‎notebooks/introductory/Part_3_1_Building_a_Concept_Database_and_Vocabulary.html‎
Lines changed: 7 additions & 6 deletions b/‎notebooks/introductory/Part_3_1_Building_a_Concept_Database_and_Vocabulary.html‎
Lines changed: 7 additions & 6 deletions
diff --git a/‎notebooks/introductory/Part_3_1_Building_a_Concept_Database_and_Vocabulary.ipynb‎
Lines changed: 20 additions & 2 deletions b/‎notebooks/introductory/Part_3_1_Building_a_Concept_Database_and_Vocabulary.ipynb‎
Lines changed: 20 additions & 2 deletions
diff --git a/‎notebooks/introductory/Part_3_2_Extracting_Diseases_from_Electronic_Health_Records.html‎
Lines changed: 7 additions & 6 deletions b/‎notebooks/introductory/Part_3_2_Extracting_Diseases_from_Electronic_Health_Records.html‎
Lines changed: 7 additions & 6 deletions
@@ -9,7 +9,7 @@ on:
 jobs:
   main:
 
-    runs-on: macos-10.15
+    runs-on: macos-11
     strategy:
       matrix:
         part: [
@@ -27,7 +27,7 @@ jobs:
       - name: Setup Python
         uses: actions/setup-python@v2
         with:
-          python-version: "3.7"
+          python-version: "3.8"
       - name: Install dependencies
         run: |
           pip install -U pip
 
@@ -13,6 +13,7 @@ In this tutorial, we will walk you through each stage of a basic MedCAT project.
 | 2    | [Data set Preparation and Basic Statistics](https://htmlpreview.github.io/?https://github.com/CogStack/MedCATtutorials/blob/main/notebooks/introductory/Part_2_Dataset_Analysis_and_Preparation.html)                                    | [![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/CogStack/MedCATtutorials/blob/main/notebooks/introductory/Part_2_Dataset_Analysis_and_Preparation.ipynb) | [TDS](https://towardsdatascience.com/medcat-dataset-analysis-and-preparation-be8bc910bd6d)         |
 | 3.1  | [Building a new Concept Database (CDB) and Vocabulary (Vocab)](https://htmlpreview.github.io/?https://github.com/CogStack/MedCATtutorials/blob/main/notebooks/introductory/Part_3_1_Building_a_Concept_Database_and_Vocabulary.html)                 | [![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/CogStack/MedCATtutorials/blob/main/notebooks/introductory/Part_3_1_Building_a_Concept_Database_and_Vocabulary.ipynb) | [TDS](https://towardsdatascience.com/medcat-extracting-diseases-from-electronic-health-records-f53c45b3d1c1)         |
 | 3.2  | [Unsupervised training and NER+L](https://htmlpreview.github.io/?https://github.com/CogStack/MedCATtutorials/blob/main/notebooks/introductory/Part_3_2_Extracting_Diseases_from_Electronic_Health_Records.html)                                             | [![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/CogStack/MedCATtutorials/blob/main/notebooks/introductory/Part_3_2_Extracting_Diseases_from_Electronic_Health_Records.ipynb) | [TDS](https://towardsdatascience.com/medcat-extracting-diseases-from-electronic-health-records-f53c45b3d1c1)         |
+| 3.3  | [Technical model optimisations](https://htmlpreview.github.io/?https://github.com/CogStack/MedCATtutorials/blob/main/notebooks/introductory/Part_3_3_Model_technical_optimisations.html)                                             | [![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/CogStack/MedCATtutorials/blob/main/notebooks/introductory/Part_3_3_Model_technical_optimisations.ipynb) | -         |
 | 4.1  | [Creating a tokenizer model (huggingface) and embeddings for MetaAnnotations](https://htmlpreview.github.io/?https://github.com/CogStack/MedCATtutorials/blob/main/notebooks/introductory/Part_4_1_ByteLevelBPETokenizer_and_Embeddings.html) | [![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/CogStack/MedCATtutorials/blob/main/notebooks/introductory/Part_4_1_ByteLevelBPETokenizer_and_Embeddings.ipynb) | -         |
 | 4.2  | [Supervised training and fine-tuning + Meta-annotations](https://htmlpreview.github.io/?https://github.com/CogStack/MedCATtutorials/blob/main/notebooks/introductory/Part_4_2_Supervised_Training_and_Meta_annotations.html)                      | [![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/CogStack/MedCATtutorials/blob/main/notebooks/introductory/Part_4_2_Supervised_Training_and_Meta_annotations.ipynb) | -         |
 | 4.3  | [Annotating documents with the full MedCAT pipeline with MetaAnnotations](https://htmlpreview.github.io/?https://github.com/CogStack/MedCATtutorials/blob/main/notebooks/introductory/Part_4_3_Annotating_documents_with_the_full_MedCAT_pipeline_with_MetaAnnotations.html)     | [![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/CogStack/MedCATtutorials/blob/main/notebooks/introductory/Part_4_3_Annotating_documents_with_the_full_MedCAT_pipeline_with_MetaAnnotations.ipynb) | -         |
 
@@ -13095,7 +13095,7 @@ <h1 id="MedCAT-tutorial---logging-with-MedCAT">MedCAT tutorial - logging with Me
 <div class="inner_cell">
     <div class="input_area">
 <div class=" highlight hl-ipython3"><pre><span></span><span class="c1"># Install medcat</span>
-<span class="o">!</span> pip install <span class="nv">medcat</span><span class="o">==</span><span class="m">1</span>.5.0
+<span class="o">!</span> pip install <span class="nv">medcat</span><span class="o">==</span><span class="m">1</span>.8.0
 <span class="k">try</span><span class="p">:</span>
     <span class="kn">from</span> <span class="nn">medcat.cat</span> <span class="kn">import</span> <span class="n">CAT</span>
 <span class="k">except</span><span class="p">:</span>
 
@@ -1,6 +1,7 @@
 {
  "cells": [
   {
+   "attachments": {},
    "cell_type": "markdown",
    "metadata": {},
    "source": [
@@ -18,7 +19,7 @@
    "outputs": [],
    "source": [
     "# Install medcat\n",
-    "! pip install medcat==1.5.0\n",
+    "! pip install medcat==1.8.0\n",
     "try:\n",
     "    from medcat.cat import CAT\n",
     "except:\n",
@@ -27,6 +28,7 @@
    ]
   },
   {
+   "attachments": {},
    "cell_type": "markdown",
    "metadata": {},
    "source": [
@@ -62,13 +64,15 @@
    ]
   },
   {
+   "attachments": {},
    "cell_type": "markdown",
    "metadata": {},
    "source": [
     "What we must now understand is that the `logging` library uses a hierarchical system for the loggers. That means that all the module-level loggers within MedCAT have the `medcat.logger` (which is the package-level logger) as their parent logger. So if we want to change the logging behaviour for the entire project, we can just interact with this one logger. However, if we want fine grained control, we can interact with each module-level logger separately."
    ]
   },
   {
+   "attachments": {},
    "cell_type": "markdown",
    "metadata": {},
    "source": [
@@ -100,6 +104,7 @@
    ]
   },
   {
+   "attachments": {},
    "cell_type": "markdown",
    "metadata": {},
    "source": [
@@ -111,6 +116,7 @@
    ]
   },
   {
+   "attachments": {},
    "cell_type": "markdown",
    "metadata": {},
    "source": [
@@ -136,6 +142,7 @@
    ]
   },
   {
+   "attachments": {},
    "cell_type": "markdown",
    "metadata": {},
    "source": [
@@ -172,6 +179,7 @@
    ]
   },
   {
+   "attachments": {},
    "cell_type": "markdown",
    "metadata": {},
    "source": [
 
@@ -13099,7 +13099,7 @@ <h3 id="First-we-need-to-install-MedCAT">First we need to install MedCAT<a class
 <div class="inner_cell">
     <div class="input_area">
 <div class=" highlight hl-ipython3"><pre><span></span><span class="c1"># Install MedCAT</span>
-<span class="o">!</span> pip install <span class="nv">medcat</span><span class="o">==</span><span class="m">1</span>.5.0
+<span class="o">!</span> pip install <span class="nv">medcat</span><span class="o">==</span><span class="m">1</span>.8.0
 <span class="c1"># Get the scispacy model</span>
 <span class="o">!</span> python -m spacy download en_core_web_md
 <span class="k">try</span><span class="p">:</span>
@@ -13445,7 +13445,8 @@ <h3 id="First-we-need-to-install-MedCAT">First we need to install MedCAT<a class
 <div class="prompt input_prompt">In&nbsp;[2]:</div>
 <div class="inner_cell">
     <div class="input_area">
-<div class=" highlight hl-ipython3"><pre><span></span><span class="n">DATA_DIR</span> <span class="o">=</span> <span class="s2">&quot;./data/&quot;</span>
+<div class=" highlight hl-ipython3"><pre><span></span><span class="n">DATA_DIR</span> <span class="o">=</span> <span class="s2">&quot;./data_p3.1/&quot;</span>
+<span class="o">!</span> <span class="nv">DATA_DIR</span><span class="o">=</span><span class="s2">&quot;./data_p3.1/&quot;</span>
 </pre></div>
 
     </div>
@@ -13459,9 +13460,9 @@ <h3 id="First-we-need-to-install-MedCAT">First we need to install MedCAT<a class
 <div class="inner_cell">
     <div class="input_area">
 <div class=" highlight hl-ipython3"><pre><span></span><span class="c1"># Load files if in google colab, otherwise skip this step</span>
-<span class="o">!</span>wget https://raw.githubusercontent.com/CogStack/MedCATtutorials/main/notebooks/introductory/data/cdb_simple.csv -P ./data/
-<span class="o">!</span>wget https://raw.githubusercontent.com/CogStack/MedCATtutorials/main/notebooks/introductory/data/cdb_advanced.csv -P ./data/
-<span class="o">!</span>wget https://raw.githubusercontent.com/CogStack/MedCATtutorials/main/notebooks/introductory/data/vocab_data.txt -P ./data/
+<span class="o">!</span>wget -N https://raw.githubusercontent.com/CogStack/MedCATtutorials/main/notebooks/introductory/data/cdb_simple.csv -P <span class="nv">$DATA_DIR</span>
+<span class="o">!</span>wget -N https://raw.githubusercontent.com/CogStack/MedCATtutorials/main/notebooks/introductory/data/cdb_advanced.csv -P <span class="nv">$DATA_DIR</span>
+<span class="o">!</span>wget -N https://raw.githubusercontent.com/CogStack/MedCATtutorials/main/notebooks/introductory/data/vocab_data.txt -P <span class="nv">$DATA_DIR</span>
 </pre></div>
 
     </div>
@@ -13660,7 +13661,7 @@ <h2 id="Building-a-Vocabulary">Building a Vocabulary<a class="anchor-link" href=
 <div class="inner_cell">
     <div class="input_area">
 <div class=" highlight hl-ipython3"><pre><span></span><span class="c1"># If you want to add words manually (one by one) use:</span>
-<span class="n">vocab</span><span class="o">.</span><span class="n">add_word</span><span class="p">(</span><span class="s2">&quot;test&quot;</span><span class="p">,</span> <span class="n">cnt</span><span class="o">=</span><span class="mi">31</span><span class="p">,</span> <span class="n">vec</span><span class="o">=</span><span class="p">[</span><span class="mf">1.42</span><span class="p">,</span> <span class="mf">1.44</span><span class="p">,</span> <span class="mf">1.55</span><span class="p">],</span> <span class="n">replace</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
+<span class="n">vocab</span><span class="o">.</span><span class="n">add_word</span><span class="p">(</span><span class="s2">&quot;test&quot;</span><span class="p">,</span> <span class="n">cnt</span><span class="o">=</span><span class="mi">31</span><span class="p">,</span> <span class="n">vec</span><span class="o">=</span><span class="n">np</span><span class="o">.</span><span class="n">array</span><span class="p">([</span><span class="mf">1.42</span><span class="p">,</span> <span class="mf">1.44</span><span class="p">,</span> <span class="mf">1.55</span><span class="p">]),</span> <span class="n">replace</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
 <span class="n">vocab</span><span class="o">.</span><span class="n">vocab</span><span class="o">.</span><span class="n">keys</span><span class="p">()</span>
 </pre></div>
 
 
@@ -1,6 +1,7 @@
 {
  "cells": [
   {
+   "attachments": {},
    "cell_type": "markdown",
    "metadata": {
     "id": "s_j_Gu7s3wTO"
@@ -10,6 +11,7 @@
    ]
   },
   {
+   "attachments": {},
    "cell_type": "markdown",
    "metadata": {
     "id": "i4bQfWfXlKWJ"
@@ -320,7 +322,7 @@
    ],
    "source": [
     "# Install MedCAT\n",
-    "! pip install medcat==1.5.0\n",
+    "! pip install medcat==1.8.0\n",
     "# Get the scispacy model\n",
     "! python -m spacy download en_core_web_md\n",
     "try:\n",
@@ -331,6 +333,7 @@
    ]
   },
   {
+   "attachments": {},
    "cell_type": "markdown",
    "metadata": {
     "id": "LWScf8BW0BpY"
@@ -428,6 +431,7 @@
    ]
   },
   {
+   "attachments": {},
    "cell_type": "markdown",
    "metadata": {
     "id": "Kj24ZU79D-xE"
@@ -441,6 +445,7 @@
    ]
   },
   {
+   "attachments": {},
    "cell_type": "markdown",
    "metadata": {
     "id": "9POZ_dwsk7gu"
@@ -492,6 +497,7 @@
    ]
   },
   {
+   "attachments": {},
    "cell_type": "markdown",
    "metadata": {
     "id": "xPl6ghXUk7gy"
@@ -670,6 +676,7 @@
    ]
   },
   {
+   "attachments": {},
    "cell_type": "markdown",
    "metadata": {
     "id": "xG3FCinSl_Sq"
@@ -691,6 +698,7 @@
    ]
   },
   {
+   "attachments": {},
    "cell_type": "markdown",
    "metadata": {
     "id": "o6itJcEXk7hA"
@@ -711,6 +719,7 @@
    ]
   },
   {
+   "attachments": {},
    "cell_type": "markdown",
    "metadata": {
     "id": "-YBbwcNUk7hD"
@@ -731,6 +740,7 @@
    ]
   },
   {
+   "attachments": {},
    "cell_type": "markdown",
    "metadata": {
     "id": "ptRmHln9k7hG"
@@ -942,6 +952,7 @@
    ]
   },
   {
+   "attachments": {},
    "cell_type": "markdown",
    "metadata": {
     "id": "Rasu5PajojYZ"
@@ -998,6 +1009,7 @@
    ]
   },
   {
+   "attachments": {},
    "cell_type": "markdown",
    "metadata": {
     "id": "08agsFBnk7hQ"
@@ -1107,6 +1119,7 @@
    ]
   },
   {
+   "attachments": {},
    "cell_type": "markdown",
    "metadata": {
     "id": "Lpx7zGvwk7ha"
@@ -1127,6 +1140,7 @@
    ]
   },
   {
+   "attachments": {},
    "cell_type": "markdown",
    "metadata": {
     "id": "97uiDwvAk7hc"
@@ -1159,6 +1173,7 @@
    ]
   },
   {
+   "attachments": {},
    "cell_type": "markdown",
    "metadata": {
     "id": "0xqmmAue-UE4"
@@ -1205,6 +1220,7 @@
    ]
   },
   {
+   "attachments": {},
    "cell_type": "markdown",
    "metadata": {
     "id": "cqmVITvWCIr6"
@@ -1214,6 +1230,7 @@
    ]
   },
   {
+   "attachments": {},
    "cell_type": "markdown",
    "metadata": {
     "id": "DZvhmkIL8433"
@@ -1326,6 +1343,7 @@
    ]
   },
   {
+   "attachments": {},
    "cell_type": "markdown",
    "metadata": {
     "id": "9fwiKys4k7he"
@@ -1358,7 +1376,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.8.5"
+   "version": "3.9.6"
   },
   "vscode": {
    "interpreter": {
 
@@ -13092,7 +13092,7 @@ <h1 id="Now-let's-start-extracting-concepts-from-unstructured-text!">Now let's s
 <div class="inner_cell">
     <div class="input_area">
 <div class=" highlight hl-ipython3"><pre><span></span><span class="c1"># Install medcat</span>
-<span class="o">!</span> pip install <span class="nv">medcat</span><span class="o">==</span><span class="m">1</span>.5.0
+<span class="o">!</span> pip install <span class="nv">medcat</span><span class="o">==</span><span class="m">1</span>.8.0
 <span class="k">try</span><span class="p">:</span>
     <span class="kn">from</span> <span class="nn">medcat.cat</span> <span class="kn">import</span> <span class="n">CAT</span>
 <span class="k">except</span><span class="p">:</span>
@@ -13417,7 +13417,8 @@ <h1 id="Now-let's-start-extracting-concepts-from-unstructured-text!">Now let's s
 <div class="prompt input_prompt">In&nbsp;[2]:</div>
 <div class="inner_cell">
     <div class="input_area">
-<div class=" highlight hl-ipython3"><pre><span></span><span class="n">DATA_DIR</span> <span class="o">=</span> <span class="s2">&quot;./data/&quot;</span>
+<div class=" highlight hl-ipython3"><pre><span></span><span class="n">DATA_DIR</span> <span class="o">=</span> <span class="s2">&quot;./data_p3.2/&quot;</span>
+<span class="o">!</span> <span class="nv">DATA_DIR</span><span class="o">=</span><span class="s2">&quot;./data_p3.2/&quot;</span>
 <span class="n">model_pack_path</span> <span class="o">=</span> <span class="n">DATA_DIR</span> <span class="o">+</span> <span class="s2">&quot;medmen_wstatus_2021_oct.zip&quot;</span>
 </pre></div>
 
@@ -13432,8 +13433,8 @@ <h1 id="Now-let's-start-extracting-concepts-from-unstructured-text!">Now let's s
 <div class="inner_cell">
     <div class="input_area">
 <div class=" highlight hl-ipython3"><pre><span></span><span class="c1"># Download the models and required data</span>
-<span class="o">!</span>wget https://medcat.rosalind.kcl.ac.uk/media/medmen_wstatus_2021_oct.zip -P ./data/
-<span class="o">!</span>wget https://raw.githubusercontent.com/CogStack/MedCATtutorials/main/notebooks/introductory/data/pt_notes.csv -P ./data/
+<span class="o">!</span>wget -N https://medcat.rosalind.kcl.ac.uk/media/medmen_wstatus_2021_oct.zip -P <span class="nv">$DATA_DIR</span>
+<span class="o">!</span>wget -N https://raw.githubusercontent.com/CogStack/MedCATtutorials/main/notebooks/introductory/data/pt_notes.csv -P <span class="nv">$DATA_DIR</span>
 </pre></div>
 
     </div>
@@ -14695,10 +14696,10 @@ <h2 id="Use-Multiprocessing">Use Multiprocessing<a class="anchor-link" href="#Us
 
 
 
-<div id="baa73754-e2b0-4efb-a3ff-3f7829c43498"></div>
+<div id="25a2996e-6950-4a46-b96a-2244bd1dbcec"></div>
 <div class="output_subarea output_widget_view ">
 <script type="text/javascript">
-var element = $('#baa73754-e2b0-4efb-a3ff-3f7829c43498');
+var element = $('#25a2996e-6950-4a46-b96a-2244bd1dbcec');
 </script>
 <script type="application/vnd.jupyter.widget-view+json">
 {"model_id": "05b18c97da9d4d05b9280df006a5fb82", "version_major": 2, "version_minor": 0}
Original file line number	Diff line number	Diff line change
`@@ -1,6 +1,7 @@`
`1`	`1`	`{`
`2`	`2`	`"cells": [`
`3`	`3`	`{`
	`4`	`+ "attachments": {},`
`4`	`5`	`"cell_type": "markdown",`
`5`	`6`	`"metadata": {},`
`6`	`7`	`"source": [`
`@@ -18,7 +19,7 @@`
`18`	`19`	`"outputs": [],`
`19`	`20`	`"source": [`
`20`	`21`	`"# Install medcat\n",`
`21`		`- "! pip install medcat==1.5.0\n",`
	`22`	`+ "! pip install medcat==1.8.0\n",`
`22`	`23`	`"try:\n",`
`23`	`24`	`" from medcat.cat import CAT\n",`
`24`	`25`	`"except:\n",`
`@@ -27,6 +28,7 @@`
`27`	`28`	`]`
`28`	`29`	`},`
`29`	`30`	`{`
	`31`	`+ "attachments": {},`
`30`	`32`	`"cell_type": "markdown",`
`31`	`33`	`"metadata": {},`
`32`	`34`	`"source": [`
`@@ -62,13 +64,15 @@`
`62`	`64`	`]`
`63`	`65`	`},`
`64`	`66`	`{`
	`67`	`+ "attachments": {},`
`65`	`68`	`"cell_type": "markdown",`
`66`	`69`	`"metadata": {},`
`67`	`70`	`"source": [`
`68`	`71`	"What we must now understand is that the `logging` library uses a hierarchical system for the loggers. That means that all the module-level loggers within MedCAT have the `medcat.logger` (which is the package-level logger) as their parent logger. So if we want to change the logging behaviour for the entire project, we can just interact with this one logger. However, if we want fine grained control, we can interact with each module-level logger separately."
`69`	`72`	`]`
`70`	`73`	`},`
`71`	`74`	`{`
	`75`	`+ "attachments": {},`
`72`	`76`	`"cell_type": "markdown",`
`73`	`77`	`"metadata": {},`
`74`	`78`	`"source": [`
`@@ -100,6 +104,7 @@`
`100`	`104`	`]`
`101`	`105`	`},`
`102`	`106`	`{`
	`107`	`+ "attachments": {},`
`103`	`108`	`"cell_type": "markdown",`
`104`	`109`	`"metadata": {},`
`105`	`110`	`"source": [`
`@@ -111,6 +116,7 @@`
`111`	`116`	`]`
`112`	`117`	`},`
`113`	`118`	`{`
	`119`	`+ "attachments": {},`
`114`	`120`	`"cell_type": "markdown",`
`115`	`121`	`"metadata": {},`
`116`	`122`	`"source": [`
`@@ -136,6 +142,7 @@`
`136`	`142`	`]`
`137`	`143`	`},`
`138`	`144`	`{`
	`145`	`+ "attachments": {},`
`139`	`146`	`"cell_type": "markdown",`
`140`	`147`	`"metadata": {},`
`141`	`148`	`"source": [`
`@@ -172,6 +179,7 @@`
`172`	`179`	`]`
`173`	`180`	`},`
`174`	`181`	`{`
	`182`	`+ "attachments": {},`
`175`	`183`	`"cell_type": "markdown",`
`176`	`184`	`"metadata": {},`
`177`	`185`	`"source": [`