Skip to content

Commit d59034f

Browse files
🌐[i18n-KO] Translate autoclass_tutorial to Korean and Fix the typo of quicktour (#22533)
translate the autoclass_tutorial and fix the typo of the quicktour
1 parent ee8e80a commit d59034f

File tree

3 files changed

+142
-3
lines changed

3 files changed

+142
-3
lines changed

β€Ždocs/source/ko/_toctree.yml

+2-2
Original file line numberDiff line numberDiff line change
@@ -9,8 +9,8 @@
99
- sections:
1010
- local: in_translation
1111
title: (λ²ˆμ—­μ€‘) Pipelines for inference
12-
- local: in_translation
13-
title: (λ²ˆμ—­μ€‘) Load pretrained instances with an AutoClass
12+
- local: autoclass_tutorial
13+
title: μžλ™ 클래슀둜 사전 ν•™μŠ΅λœ μΈμŠ€ν„΄μŠ€ λ‘œλ“œν•˜κΈ°
1414
- local: in_translation
1515
title: (λ²ˆμ—­μ€‘) Preprocess
1616
- local: in_translation
+139
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,139 @@
1+
<!--Copyright 2022 The HuggingFace Team. All rights reserved.
2+
3+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4+
the License. You may obtain a copy of the License at
5+
6+
http://www.apache.org/licenses/LICENSE-2.0
7+
8+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10+
specific language governing permissions and limitations under the License.
11+
-->
12+
13+
# AutoClass둜 사전 ν•™μŠ΅λœ μΈμŠ€ν„΄μŠ€ λ‘œλ“œ[[Load pretrained instances with an AutoClass]]
14+
15+
트랜슀포머 μ•„ν‚€ν…μ²˜κ°€ 맀우 λ‹€μ–‘ν•˜κΈ° λ•Œλ¬Έμ— μ²΄ν¬ν¬μΈνŠΈμ— λ§žλŠ” μ•„ν‚€ν…μ²˜λ₯Ό μƒμ„±ν•˜λŠ” 것이 μ–΄λ €μšΈ 수 μžˆμŠ΅λ‹ˆλ‹€. 라이브러리λ₯Ό 쉽고 κ°„λ‹¨ν•˜λ©° μœ μ—°ν•˜κ²Œ μ‚¬μš©ν•˜κΈ° μœ„ν•œ Transformer 핡심 μ² ν•™μ˜ μΌν™˜μœΌλ‘œ, `AutoClass`λŠ” 주어진 μ²΄ν¬ν¬μΈνŠΈμ—μ„œ μ˜¬λ°”λ₯Έ μ•„ν‚€ν…μ²˜λ₯Ό μžλ™μœΌλ‘œ μΆ”λ‘ ν•˜μ—¬ λ‘œλ“œν•©λ‹ˆλ‹€. `from_pretrained()` λ©”μ„œλ“œλ₯Ό μ‚¬μš©ν•˜λ©΄ λͺ¨λ“  μ•„ν‚€ν…μ²˜μ— λŒ€ν•΄ 사전 ν•™μŠ΅λœ λͺ¨λΈμ„ λΉ λ₯΄κ²Œ λ‘œλ“œν•  수 μžˆμœΌλ―€λ‘œ λͺ¨λΈμ„ μ²˜μŒλΆ€ν„° ν•™μŠ΅ν•˜λŠ” 데 μ‹œκ°„κ³Ό λ¦¬μ†ŒμŠ€λ₯Ό νˆ¬μž…ν•  ν•„μš”κ°€ μ—†μŠ΅λ‹ˆλ‹€. μ΄λŸ¬ν•œ μœ ν˜•μ˜ μ²΄ν¬ν¬μΈνŠΈμ— ꡬ애받지 μ•ŠλŠ” μ½”λ“œλ₯Ό μƒμ„±ν•œλ‹€λŠ” 것은 μ½”λ“œκ°€ ν•œ μ²΄ν¬ν¬μΈνŠΈμ—μ„œ μž‘λ™ν•œλ‹€λ©΄ μ•„ν‚€ν…μ²˜κ°€ λ‹€λ₯΄λ”라도 μœ μ‚¬ν•œ μž‘μ—…μ— λŒ€ν•΄ ν•™μŠ΅λœ 것이라면 λ‹€λ₯Έ μ²΄ν¬ν¬μΈνŠΈμ—μ„œλ„ μž‘λ™ν•œλ‹€λŠ” 것을 μ˜λ―Έν•©λ‹ˆλ‹€.
16+
17+
<Tip>
18+
19+
μ•„ν‚€ν…μ²˜λŠ” λͺ¨λΈμ˜ 골격을 μ˜λ―Έν•˜λ©° μ²΄ν¬ν¬μΈνŠΈλŠ” 주어진 μ•„ν‚€ν…μ²˜μ— λŒ€ν•œ κ°€μ€‘μΉ˜μž…λ‹ˆλ‹€. 예λ₯Ό λ“€μ–΄, [BERT](https://huggingface.co/bert-base-uncased)λŠ” μ•„ν‚€ν…μ²˜μ΄κ³ , `bert-base-uncased`λŠ” μ²΄ν¬ν¬μΈνŠΈμž…λ‹ˆλ‹€. λͺ¨λΈμ€ μ•„ν‚€ν…μ²˜ λ˜λŠ” 체크포인트λ₯Ό μ˜λ―Έν•  수 μžˆλŠ” 일반적인 μš©μ–΄μž…λ‹ˆλ‹€.
20+
21+
</Tip>
22+
23+
이 νŠœν† λ¦¬μ–Όμ—μ„œλŠ” λ‹€μŒμ„ ν•™μŠ΅ν•©λ‹ˆλ‹€:
24+
25+
* 사전 ν•™μŠ΅λœ ν† ν¬λ‚˜μ΄μ € λ‘œλ“œν•˜κΈ°.
26+
* 사전 ν•™μŠ΅λœ 이미지 ν”„λ‘œμ„Έμ„œ λ‘œλ“œν•˜κΈ°
27+
* 사전 ν•™μŠ΅λœ νŠΉμ§• μΆ”μΆœκΈ° λ‘œλ“œν•˜κΈ°.
28+
* 사전 ν›ˆλ ¨λœ ν”„λ‘œμ„Έμ„œ λ‘œλ“œν•˜κΈ°.
29+
* 사전 ν•™μŠ΅λœ λͺ¨λΈ λ‘œλ“œν•˜κΈ°.
30+
31+
## AutoTokenizer
32+
33+
거의 λͺ¨λ“  NLP μž‘μ—…μ€ ν† ν¬λ‚˜μ΄μ €λ‘œ μ‹œμž‘λ©λ‹ˆλ‹€. ν† ν¬λ‚˜μ΄μ €λŠ” μ‚¬μš©μžμ˜ μž…λ ₯을 λͺ¨λΈμ—μ„œ μ²˜λ¦¬ν•  수 μžˆλŠ” ν˜•μ‹μœΌλ‘œ λ³€ν™˜ν•©λ‹ˆλ‹€.
34+
[`AutoTokenizer.from_pretrained`]둜 ν† ν¬λ‚˜μ΄μ €λ₯Ό λ‘œλ“œν•©λ‹ˆλ‹€:
35+
36+
```py
37+
>>> from transformers import AutoTokenizer
38+
39+
>>> tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
40+
```
41+
42+
그리고 λ‹€μŒ μ•„λž˜μ™€ 같이 μž…λ ₯을 ν† ν°ν™”ν•©λ‹ˆλ‹€:
43+
44+
```py
45+
>>> sequence = "In a hole in the ground there lived a hobbit."
46+
>>> print(tokenizer(sequence))
47+
{'input_ids': [101, 1999, 1037, 4920, 1999, 1996, 2598, 2045, 2973, 1037, 7570, 10322, 4183, 1012, 102],
48+
'token_type_ids': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
49+
'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]}
50+
```
51+
52+
## AutoImageProcessor
53+
54+
λΉ„μ „ μž‘μ—…μ˜ 경우 이미지 ν”„λ‘œμ„Έμ„œκ°€ 이미지λ₯Ό μ˜¬λ°”λ₯Έ μž…λ ₯ ν˜•μ‹μœΌλ‘œ μ²˜λ¦¬ν•©λ‹ˆλ‹€.
55+
56+
```py
57+
>>> from transformers import AutoImageProcessor
58+
59+
>>> image_processor = AutoImageProcessor.from_pretrained("google/vit-base-patch16-224")
60+
```
61+
62+
63+
## AutoFeatureExtractor
64+
65+
μ˜€λ””μ˜€ μž‘μ—…μ˜ 경우 νŠΉμ§• μΆ”μΆœκΈ°κ°€ μ˜€λ””μ˜€ μ‹ ν˜Έλ₯Ό μ˜¬λ°”λ₯Έ μž…λ ₯ ν˜•μ‹μœΌλ‘œ μ²˜λ¦¬ν•©λ‹ˆλ‹€.
66+
67+
[`AutoFeatureExtractor.from_pretrained`]둜 νŠΉμ§• μΆ”μΆœκΈ°λ₯Ό λ‘œλ“œν•©λ‹ˆλ‹€:
68+
69+
```py
70+
>>> from transformers import AutoFeatureExtractor
71+
72+
>>> feature_extractor = AutoFeatureExtractor.from_pretrained(
73+
... "ehcalabres/wav2vec2-lg-xlsr-en-speech-emotion-recognition"
74+
... )
75+
```
76+
77+
## AutoProcessor
78+
79+
λ©€ν‹°λͺ¨λ‹¬ μž‘μ—…μ—λŠ” 두 가지 μœ ν˜•μ˜ μ „μ²˜λ¦¬ 도ꡬλ₯Ό κ²°ν•©ν•œ ν”„λ‘œμ„Έμ„œκ°€ ν•„μš”ν•©λ‹ˆλ‹€. 예λ₯Ό λ“€μ–΄ LayoutLMV2 λͺ¨λΈμ—λŠ” 이미지λ₯Ό μ²˜λ¦¬ν•˜λŠ” 이미지 ν”„λ‘œμ„Έμ„œμ™€ ν…μŠ€νŠΈλ₯Ό μ²˜λ¦¬ν•˜λŠ” ν† ν¬λ‚˜μ΄μ €κ°€ ν•„μš”ν•˜λ©°, ν”„λ‘œμ„Έμ„œλŠ” 이 두 가지λ₯Ό κ²°ν•©ν•©λ‹ˆλ‹€.
80+
81+
[`AutoProcessor.from_pretrained()`]둜 ν”„λ‘œμ„Έμ„œλ₯Ό λ‘œλ“œν•©λ‹ˆλ‹€:
82+
83+
```py
84+
>>> from transformers import AutoProcessor
85+
86+
>>> processor = AutoProcessor.from_pretrained("microsoft/layoutlmv2-base-uncased")
87+
```
88+
89+
## AutoModel
90+
91+
<frameworkcontent>
92+
<pt>
93+
λ§ˆμ§€λ§‰μœΌλ‘œ AutoModelFor클래슀λ₯Ό μ‚¬μš©ν•˜λ©΄ 주어진 μž‘μ—…μ— λŒ€ν•΄ 미리 ν•™μŠ΅λœ λͺ¨λΈμ„ λ‘œλ“œν•  수 μžˆμŠ΅λ‹ˆλ‹€ (μ‚¬μš© κ°€λŠ₯ν•œ μž‘μ—…μ˜ 전체 λͺ©λ‘μ€ [μ—¬κΈ°](model_doc/auto)λ₯Ό μ°Έμ‘°ν•˜μ„Έμš”). 예λ₯Ό λ“€μ–΄, [`AutoModelForSequenceClassification.from_pretrained`]λ₯Ό μ‚¬μš©ν•˜μ—¬ μ‹œν€€μŠ€ λΆ„λ₯˜μš© λͺ¨λΈμ„ λ‘œλ“œν•  수 μžˆμŠ΅λ‹ˆλ‹€:
94+
95+
```py
96+
>>> from transformers import AutoModelForSequenceClassification
97+
98+
>>> model = AutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased")
99+
```
100+
101+
λ™μΌν•œ 체크포인트λ₯Ό μ‰½κ²Œ μž¬μ‚¬μš©ν•˜μ—¬ λ‹€λ₯Έ μž‘μ—…μ— μ•„ν‚€ν…μ²˜λ₯Ό λ‘œλ“œν•  수 μžˆμŠ΅λ‹ˆλ‹€:
102+
103+
```py
104+
>>> from transformers import AutoModelForTokenClassification
105+
106+
>>> model = AutoModelForTokenClassification.from_pretrained("distilbert-base-uncased")
107+
```
108+
109+
<Tip warning={true}>
110+
111+
PyTorchλͺ¨λΈμ˜ 경우 `from_pretrained()` λ©”μ„œλ“œλŠ” λ‚΄λΆ€μ μœΌλ‘œ 피클을 μ‚¬μš©ν•˜μ—¬ μ•ˆμ „ν•˜μ§€ μ•Šμ€ κ²ƒμœΌλ‘œ μ•Œλ €μ§„ `torch.load()`λ₯Ό μ‚¬μš©ν•©λ‹ˆλ‹€.
112+
일반적으둜 μ‹ λ’°ν•  수 μ—†λŠ” μ†ŒμŠ€μ—μ„œ κ°€μ Έμ™”κ±°λ‚˜ λ³€μ‘°λ˜μ—ˆμ„ 수 μžˆλŠ” λͺ¨λΈμ€ λ‘œλ“œν•˜μ§€ λ§ˆμ„Έμš”. ν—ˆκΉ… 페이슀 ν—ˆλΈŒμ—μ„œ ν˜ΈμŠ€νŒ…λ˜λŠ” 곡개 λͺ¨λΈμ˜ 경우 μ΄λŸ¬ν•œ λ³΄μ•ˆ μœ„ν—˜μ΄ λΆ€λΆ„μ μœΌλ‘œ μ™„ν™”λ˜λ©°, 각 컀밋 μ‹œ 멀웨어λ₯Ό [κ²€μ‚¬ν•©λ‹ˆλ‹€](https://huggingface.co/docs/hub/security-malware). GPGλ₯Ό μ‚¬μš©ν•΄ μ„œλͺ…λœ [컀밋 검증](https://huggingface.co/docs/hub/security-gpg#signing-commits-with-gpg)κ³Ό 같은 λͺ¨λ²”μ‚¬λ‘€λŠ” [λ¬Έμ„œ](https://huggingface.co/docs/hub/security)λ₯Ό μ°Έμ‘°ν•˜μ„Έμš”.
113+
114+
ν…μ„œν”Œλ‘œμš°μ™€ Flax μ²΄ν¬ν¬μΈνŠΈλŠ” 영ν–₯을 받지 μ•ŠμœΌλ©°, `from_pretrained`λ©”μ„œλ“œμ— `from_tf` 와 `from_flax` ν‚€μ›Œλ“œ κ°€λ³€ 인자λ₯Ό μ‚¬μš©ν•˜μ—¬ 이 문제λ₯Ό μš°νšŒν•  수 μžˆμŠ΅λ‹ˆλ‹€.
115+
116+
</Tip>
117+
118+
일반적으둜 AutoTokenizer ν΄λž˜μŠ€μ™€ AutoModelFor 클래슀λ₯Ό μ‚¬μš©ν•˜μ—¬ 미리 ν•™μŠ΅λœ λͺ¨λΈ μΈμŠ€ν„΄μŠ€λ₯Ό λ‘œλ“œν•˜λŠ” 것이 μ’‹μŠ΅λ‹ˆλ‹€. μ΄λ ‡κ²Œ ν•˜λ©΄ 맀번 μ˜¬λ°”λ₯Έ μ•„ν‚€ν…μ²˜λ₯Ό λ‘œλ“œν•  수 μžˆμŠ΅λ‹ˆλ‹€. λ‹€μŒ [νŠœν† λ¦¬μ–Ό](preprocessing)μ—μ„œλŠ” μƒˆλ‘­κ²Œ λ‘œλ“œν•œ ν† ν¬λ‚˜μ΄μ €, 이미지 ν”„λ‘œμ„Έμ„œ, νŠΉμ§• μΆ”μΆœκΈ°λ₯Ό μ‚¬μš©ν•˜μ—¬ λ―Έμ„Έ νŠœλ‹μš© 데이터 μ„ΈνŠΈλ₯Ό μ „μ²˜λ¦¬ν•˜λŠ” 방법에 λŒ€ν•΄ μ•Œμ•„λ΄…λ‹ˆλ‹€.
119+
</pt>
120+
<tf>
121+
λ§ˆμ§€λ§‰μœΌλ‘œ `TFAutoModelFor` 클래슀λ₯Ό μ‚¬μš©ν•˜λ©΄ 주어진 μž‘μ—…μ— λŒ€ν•΄ 사전 ν›ˆλ ¨λœ λͺ¨λΈμ„ λ‘œλ“œν•  수 μžˆμŠ΅λ‹ˆλ‹€. (μ‚¬μš© κ°€λŠ₯ν•œ μž‘μ—…μ˜ 전체 λͺ©λ‘μ€ [μ—¬κΈ°](model_doc/auto)λ₯Ό μ°Έμ‘°ν•˜μ„Έμš”. 예λ₯Ό λ“€μ–΄, [`TFAutoModelForSequenceClassification.from_pretrained`]둜 μ‹œν€€μŠ€ λΆ„λ₯˜λ₯Ό μœ„ν•œ λͺ¨λΈμ„ λ‘œλ“œν•©λ‹ˆλ‹€:
122+
123+
```py
124+
>>> from transformers import TFAutoModelForSequenceClassification
125+
126+
>>> model = TFAutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased")
127+
```
128+
129+
μ‰½κ²Œ λ™μΌν•œ 체크포인트λ₯Ό μž¬μ‚¬μš©ν•˜μ—¬ λ‹€λ₯Έ μž‘μ—…μ— μ•„ν‚€ν…μ²˜λ₯Ό λ‘œλ“œν•  수 μžˆμŠ΅λ‹ˆλ‹€:
130+
131+
```py
132+
>>> from transformers import TFAutoModelForTokenClassification
133+
134+
>>> model = TFAutoModelForTokenClassification.from_pretrained("distilbert-base-uncased")
135+
```
136+
137+
일반적으둜, `AutoTokenizer`ν΄λž˜μŠ€μ™€ `TFAutoModelFor` 클래슀λ₯Ό μ‚¬μš©ν•˜μ—¬ 미리 ν•™μŠ΅λœ λͺ¨λΈ μΈμŠ€ν„΄μŠ€λ₯Ό λ‘œλ“œν•˜λŠ” 것이 μ’‹μŠ΅λ‹ˆλ‹€. μ΄λ ‡κ²Œ ν•˜λ©΄ 맀번 μ˜¬λ°”λ₯Έ μ•„ν‚€ν…μ²˜λ₯Ό λ‘œλ“œν•  수 μžˆμŠ΅λ‹ˆλ‹€. λ‹€μŒ [νŠœν† λ¦¬μ–Ό](preprocessing)μ—μ„œλŠ” μƒˆλ‘­κ²Œ λ‘œλ“œν•œ ν† ν¬λ‚˜μ΄μ €, 이미지 ν”„λ‘œμ„Έμ„œ, νŠΉμ§• μΆ”μΆœκΈ°λ₯Ό μ‚¬μš©ν•˜μ—¬ λ―Έμ„Έ νŠœλ‹μš© 데이터 μ„ΈνŠΈλ₯Ό μ „μ²˜λ¦¬ν•˜λŠ” 방법에 λŒ€ν•΄ μ•Œμ•„λ΄…λ‹ˆλ‹€.
138+
</tf>
139+
</frameworkcontent>

β€Ždocs/source/ko/quicktour.mdx

+1-1
Original file line numberDiff line numberDiff line change
@@ -168,7 +168,7 @@ label: NEGATIVE, with score: 0.5309
168168

169169
### AutoTokenizer
170170

171-
ν† ν°λ‚˜μ΄μ €λŠ” μ „μ²˜λ¦¬λ₯Ό λ‹΄λ‹Ήν•˜λ©°, ν…μŠ€νŠΈλ₯Ό λͺ¨λΈμ΄ 받을 숫자 λ°°μ—΄λ‘œ λ°”κΏ‰λ‹ˆλ‹€. 토큰화 κ³Όμ •μ—λŠ” 단어λ₯Ό μ–΄λ””μ—μ„œ λŠμ„μ§€, μ–Όλ§ŒνΌ λ‚˜λˆŒμ§€ 등을 ν¬ν•¨ν•œ μ—¬λŸ¬ κ·œμΉ™μ΄ μžˆμŠ΅λ‹ˆλ‹€. μžμ„Έν•œ λ‚΄μš©μ€ [ν† ν¬λ‚˜μ΄μ € μš”μ•½](./tokenizer_summary)λ₯Ό ν™•μΈν•΄μ£Όμ„Έμš”. 제일 μ€‘μš”ν•œ 점은 λͺ¨λΈμ΄ ν›ˆλ ¨λμ„ λ•Œμ™€ λ™μΌν•œ 토큰화 κ·œμΉ™μ„ 쓰도둝 λ™μΌν•œ λͺ¨λΈ μ΄λ¦„μœΌλ‘œ ν† ν¬λ‚˜μ΄μ € μΈμŠ€ν„΄μŠ€λ₯Ό λ§Œλ“€μ–΄μ•Ό ν•©λ‹ˆλ‹€.
171+
ν† ν¬λ‚˜μ΄μ €λŠ” μ „μ²˜λ¦¬λ₯Ό λ‹΄λ‹Ήν•˜λ©°, ν…μŠ€νŠΈλ₯Ό λͺ¨λΈμ΄ 받을 숫자 λ°°μ—΄λ‘œ λ°”κΏ‰λ‹ˆλ‹€. 토큰화 κ³Όμ •μ—λŠ” 단어λ₯Ό μ–΄λ””μ—μ„œ λŠμ„μ§€, μ–Όλ§ŒνΌ λ‚˜λˆŒμ§€ 등을 ν¬ν•¨ν•œ μ—¬λŸ¬ κ·œμΉ™μ΄ μžˆμŠ΅λ‹ˆλ‹€. μžμ„Έν•œ λ‚΄μš©μ€ [ν† ν¬λ‚˜μ΄μ € μš”μ•½](./tokenizer_summary)λ₯Ό ν™•μΈν•΄μ£Όμ„Έμš”. 제일 μ€‘μš”ν•œ 점은 λͺ¨λΈμ΄ ν›ˆλ ¨λμ„ λ•Œμ™€ λ™μΌν•œ 토큰화 κ·œμΉ™μ„ 쓰도둝 λ™μΌν•œ λͺ¨λΈ μ΄λ¦„μœΌλ‘œ ν† ν¬λ‚˜μ΄μ € μΈμŠ€ν„΄μŠ€λ₯Ό λ§Œλ“€μ–΄μ•Ό ν•©λ‹ˆλ‹€.
172172

173173
[`AutoTokenizer`]둜 ν† ν¬λ‚˜μ΄μ €λ₯Ό 뢈러였고,
174174

0 commit comments

Comments
Β (0)