Skip to content

Commit acbf3d8

Browse files
committed
Updated the tutorial for tokenizer.
1 parent 36bb2eb commit acbf3d8

File tree

1 file changed

+45
-0
lines changed

1 file changed

+45
-0
lines changed

tutorials/tokenizer.md

+45
Original file line numberDiff line numberDiff line change
@@ -1 +1,46 @@
11
# Tokenizer
2+
3+
## INSTALLATION
4+
`Tokenizer` is provided as a `.gem` package. Simply install it via
5+
{RubyGems}[http://rubygems.org/gems/tokenizer].
6+
7+
To install `tokenizer` issue the following command:
8+
9+
``` shell
10+
$ gem install tokenizer
11+
```
12+
13+
If you want to do a system wide installation, do this as root
14+
(possibly using `sudo`).
15+
16+
Alternatively use your Gemfile for dependency management.
17+
18+
## SYNOPSIS
19+
20+
You can use +Tokenizer+ in two ways.
21+
* As a command line tool:
22+
23+
``` shell
24+
$ echo 'Hi, ich gehe in die Schule!. | tokenize
25+
```
26+
27+
* As a library for embedded tokenization:
28+
29+
``` ruby
30+
> require 'tokenizer'
31+
> de_tokenizer = Tokenizer::WhitespaceTokenizer.new
32+
> de_tokenizer.tokenize('Ich gehe in die Schule!')
33+
> => ["Ich", "gehe", "in", "die", "Schule", "!"]
34+
```
35+
36+
* Customizable `PRE` and `POST` list
37+
38+
``` ruby
39+
> require 'tokenizer'
40+
> de_tokenizer = Tokenizer::WhitespaceTokenizer.new(:de, { post: Tokenizer::Tokenizer::POST + ['|'] })
41+
> de_tokenizer.tokenize('Ich gehe|in die Schule!')
42+
> => ["Ich", "gehe", "|in", "die", "Schule", "!"]
43+
```
44+
45+
See documentation in the `Tokenizer::WhitespaceTokenizer` class for details
46+
on particular methods.

0 commit comments

Comments
 (0)