Skip to content
This repository was archived by the owner on Sep 21, 2021. It is now read-only.

Commit 731f66e

Browse files
committed
Edited 220_Token_normalization/00_Intro.asciidoc with Atlas code editor
1 parent 355631a commit 731f66e

File tree

1 file changed

+5
-5
lines changed

1 file changed

+5
-5
lines changed

220_Token_normalization/00_Intro.asciidoc

+5-5
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,11 @@
11
[[token-normalization]]
2-
== Normalizing tokens
2+
== Normalizing Tokens
33

4-
Breaking text up into tokens is ((("normalization", "of tokens")))((("tokens", "normalizing")))only half the job. In order to make those
5-
tokens more easily searchable, they need to go through some _normalization_
4+
Breaking text into tokens is ((("normalization", "of tokens")))((("tokens", "normalizing")))only half the job. To make those
5+
tokens more easily searchable, they need to go through a _normalization_
66
process to remove insignificant differences between otherwise identical words,
7-
such as uppercase vs lowercase. Perhaps we also need to remove significant
8-
differences, to make `esta`, `ésta` and `está` all searchable as the same
7+
such as uppercase versus lowercase. Perhaps we also need to remove significant
8+
differences, to make `esta`, `ésta`, and `está` all searchable as the same
99
word. Would you search for `déjà vu`, or just for `deja vu`?
1010

1111
This is the job of the token filters, which((("token filters"))) receive a stream of tokens from

0 commit comments

Comments
 (0)