|
78 | 78 | "- Translation: translate a text in another language.\n",
|
79 | 79 | "- Feature extraction: return a tensor representation of the text.\n",
|
80 | 80 | "\n",
|
81 |
| - "Let's see how this work for sentiment analysis (the other tasks are all covered in the [task summary](https://huggingface.co/transformers/task_summary.html)):" |
| 81 | + "Let's see how this work for sentiment analysis (the other tasks are all covered in the [task summary](https://huggingface.co/transformers/task_summary.html)):\n", |
| 82 | + "\n", |
| 83 | + "Install the following dependencies (if not already installed):" |
| 84 | + ] |
| 85 | + }, |
| 86 | + { |
| 87 | + "cell_type": "code", |
| 88 | + "execution_count": null, |
| 89 | + "metadata": {}, |
| 90 | + "outputs": [], |
| 91 | + "source": [ |
| 92 | + "! pip install torch" |
82 | 93 | ]
|
83 | 94 | },
|
84 | 95 | {
|
|
109 | 120 | {
|
110 | 121 | "data": {
|
111 | 122 | "text/plain": [
|
112 |
| - "[{'label': 'POSITIVE', 'score': 0.9997795224189758}]" |
| 123 | + "[{'label': 'POSITIVE', 'score': 0.9998}]" |
113 | 124 | ]
|
114 | 125 | },
|
115 | 126 | "execution_count": null,
|
|
125 | 136 | "cell_type": "markdown",
|
126 | 137 | "metadata": {},
|
127 | 138 | "source": [
|
128 |
| - "That's encouraging! You can use it on a list of sentences, which will be preprocessed then fed to the model as a\n", |
129 |
| - "*batch*, returning a list of dictionaries like this one:" |
| 139 | + "That's encouraging! You can use it on a list of sentences, which will be preprocessed then fed to the model, returning\n", |
| 140 | + "a list of dictionaries like this one:" |
130 | 141 | ]
|
131 | 142 | },
|
132 | 143 | {
|
|
157 | 168 | "cell_type": "markdown",
|
158 | 169 | "metadata": {},
|
159 | 170 | "source": [
|
| 171 | + "To use with a large dataset, look at [iterating over a pipeline](https://huggingface.co/transformers/./main_classes/pipelines.html)\n", |
| 172 | + "\n", |
160 | 173 | "You can see the second sentence has been classified as negative (it needs to be positive or negative) but its score is\n",
|
161 | 174 | "fairly neutral.\n",
|
162 | 175 | "\n",
|
|
338 | 351 | {
|
339 | 352 | "data": {
|
340 | 353 | "text/plain": [
|
341 |
| - "{'input_ids': [101, 2057, 2024, 2200, 3407, 2000, 2265, 2017, 1996, 100, 19081, 3075, 1012, 102], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]}" |
| 354 | + "{'input_ids': [101, 2057, 2024, 2200, 3407, 2000, 2265, 2017, 1996, 100, 19081, 3075, 1012, 102],\n", |
| 355 | + " 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]}" |
342 | 356 | ]
|
343 | 357 | },
|
344 | 358 | "execution_count": null,
|
|
453 | 467 | "data": {
|
454 | 468 | "text/plain": [
|
455 | 469 | "SequenceClassifierOutput(loss=None, logits=tensor([[-4.0833, 4.3364],\n",
|
456 |
| - " [ 0.0818, -0.0418]], grad_fn=<AddmmBackward>), hidden_states=None, attentions=None)" |
| 470 | + " [ 0.0818, -0.0418]], grad_fn=<AddmmBackward>), hidden_states=None, attentions=None)" |
457 | 471 | ]
|
458 | 472 | },
|
459 | 473 | "execution_count": null,
|
|
542 | 556 | "data": {
|
543 | 557 | "text/plain": [
|
544 | 558 | "SequenceClassifierOutput(loss=tensor(0.3167, grad_fn=<NllLossBackward>), logits=tensor([[-4.0833, 4.3364],\n",
|
545 |
| - "[ 0.0818, -0.0418]], grad_fn=<AddmmBackward>), hidden_states=None, attentions=None)" |
| 559 | + " [ 0.0818, -0.0418]], grad_fn=<AddmmBackward>), hidden_states=None, attentions=None)" |
546 | 560 | ]
|
547 | 561 | },
|
548 | 562 | "execution_count": null,
|
|
588 | 602 | "metadata": {},
|
589 | 603 | "outputs": [],
|
590 | 604 | "source": [
|
591 |
| - "tokenizer.save_pretrained(save_directory)\n", |
592 |
| - "model.save_pretrained(save_directory)" |
| 605 | + "pt_save_directory = './pt_save_pretrained'\n", |
| 606 | + "tokenizer.save_pretrained(pt_save_directory)\n", |
| 607 | + "pt_model.save_pretrained(pt_save_directory)" |
| 608 | + ] |
| 609 | + }, |
| 610 | + { |
| 611 | + "cell_type": "code", |
| 612 | + "execution_count": null, |
| 613 | + "metadata": {}, |
| 614 | + "outputs": [], |
| 615 | + "source": [ |
| 616 | + "tf_save_directory = './tf_save_pretrained'\n", |
| 617 | + "tokenizer.save_pretrained(tf_save_directory)\n", |
| 618 | + "tf_model.save_pretrained(tf_save_directory)" |
593 | 619 | ]
|
594 | 620 | },
|
595 | 621 | {
|
|
609 | 635 | "outputs": [],
|
610 | 636 | "source": [
|
611 | 637 | "from transformers import TFAutoModel\n",
|
612 |
| - "tokenizer = AutoTokenizer.from_pretrained(save_directory)\n", |
613 |
| - "model = TFAutoModel.from_pretrained(save_directory, from_pt=True)" |
| 638 | + "tokenizer = AutoTokenizer.from_pretrained(pt_save_directory)\n", |
| 639 | + "tf_model = TFAutoModel.from_pretrained(pt_save_directory, from_pt=True)" |
614 | 640 | ]
|
615 | 641 | },
|
616 | 642 | {
|
|
627 | 653 | "outputs": [],
|
628 | 654 | "source": [
|
629 | 655 | "from transformers import AutoModel\n",
|
630 |
| - "tokenizer = AutoTokenizer.from_pretrained(save_directory)\n", |
631 |
| - "model = AutoModel.from_pretrained(save_directory, from_tf=True)" |
| 656 | + "tokenizer = AutoTokenizer.from_pretrained(tf_save_directory)\n", |
| 657 | + "pt_model = AutoModel.from_pretrained(tf_save_directory, from_tf=True)" |
632 | 658 | ]
|
633 | 659 | },
|
634 | 660 | {
|
|
0 commit comments