Skip to content

Commit 27033a5

Browse files
authored
fix typos (LAION-AI#3216)
fix typos -"try and" isn't correct grammar but I'm willing to leave it be if requested.
1 parent 1b608cd commit 27033a5

File tree

31 files changed

+47
-47
lines changed

31 files changed

+47
-47
lines changed

backend/README.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -78,7 +78,7 @@ Note: The api docs should be automatically updated by the
7878

7979
## Running Celery Worker(s) for API and periodic tasks
8080

81-
Celery workers are used for Huggiface API calls like toxicity and feature
81+
Celery workers are used for Huggingface API calls like toxicity and feature
8282
extraction. Celery Beat along with worker is used for periodic tasks like user
8383
streak update
8484

backend/oasst_backend/prompt_repository.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -253,7 +253,7 @@ def store_text_reply(
253253
)
254254
if not ts.active:
255255
logger.warning(
256-
f"Received messsage for inactive tree {parent_message.message_tree_id} (state='{ts.state.value}')."
256+
f"Received message for inactive tree {parent_message.message_tree_id} (state='{ts.state.value}')."
257257
)
258258

259259
if check_duplicate and not settings.DEBUG_ALLOW_DUPLICATE_TASKS:

backend/sql_snippets.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
Here are find some SQL queries to inspect the current OA postgres DB.
44

5-
# Baics Stats
5+
# Basic Stats
66

77
```sql
88
-- tables row counts

data/datasets/bart_searchgpt_wiki_nlp_augment/README.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -17,11 +17,11 @@ Related to [Issue #2004](https://github.com/LAION-AI/Open-Assistant/issues/2004)
1717
- OA format data (BART-based):
1818
https://huggingface.co/datasets/michaelthwan/oa_wiki_qa_bart_10000row
1919

20-
### Syntheic data based on BART
20+
### Synthetic data based on BART
2121

2222
![wiki_augment_bart](./img/wiki_augment_bart.png)
2323

24-
### Syntheic data based on SearchGPT
24+
### Synthetic data based on SearchGPT
2525

2626
![wiki_augment_searchgpt](./img/wiki_augment_searchgpt.png)
2727

data/datasets/gutenberg/README.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,7 @@ size_categories:
3333
- 1K<n<10K
3434
---
3535

36-
# Dataset Card for Project Gutenber - Multilanguage eBooks
36+
# Dataset Card for Project Gutenberg - Multilanguage eBooks
3737

3838
A collection of 7907 non-english (about 75-80% of all the ES, DE, FR, NL, IT,
3939
PT, HU books available on the site) and 48 285 english (80%+) language ebooks
@@ -103,7 +103,7 @@ metadata columns).
103103
https://www.gutenberg.org/help/copyright.html and
104104
https://www.gutenberg.org/policy/permission.html
105105
- Project Gutenberg has the following requests when using books without
106-
metadata: _Books obtianed from the Project Gutenberg site should have the
106+
metadata: _Books obtained from the Project Gutenberg site should have the
107107
following legal note next to them: "This eBook is for the use of anyone
108108
anywhere in the United States and most other parts of the world at no cost and
109109
with almost" no restrictions whatsoever. You may copy it, give it away or

data/datasets/oa_leet10k/README.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ Here we convert oa_leet10k dataset to be uploaded to huggingface.
66

77
Takes this Kaggle dataset 'leetcode-solutions'
88
https://www.kaggle.com/datasets/erichartford/leetcode-solutions, and turns them
9-
into basic dialogue using a preset list of user prompt tempaltes.
9+
into basic dialogue using a preset list of user prompt templates.
1010

1111
### Some ideas for extending this dataset
1212

data/datasets/oa_stackexchange/README.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -39,9 +39,9 @@ This dataset is taken from https://archive.org/details/stackexchange.
3939

4040
There's a single parquet file combining all stackexchange sites. The threads
4141
have been filtered as follows: only threads with an accepted answer, for which
42-
both the question and response is less than 1000 characters have been choosen.
42+
both the question and response is less than 1000 characters have been chosen.
4343
Other answers, or questions without accepted answers, or long entries have been
44-
droppped.
44+
dropped.
4545

4646
Each row consists of
4747

data/datasets/poetry_instruction/README.md

+3-3
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
Dataset Description This dataset contains around 14,000 poems from the
2-
PoetryFoundation.org site. They are converted to question:response pairs,
3-
usingthe tags as topics. 5% of the dataset is titling requests -- the user
4-
provides apoem and asks the assistant to title it.
2+
PoetryFoundation.org site. They are converted to question:response pairs, using
3+
the tags as topics. 5% of the dataset is titling requests -- the user provides a
4+
poem and asks the assistant to title it.
55

66
It can be found here, on my HuggingFace -
77
https://huggingface.co/datasets/checkai/instruction-poems

data/datasets/recipes/README.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -9,12 +9,12 @@ dataset to be uploaded to huggingface.
99
Takes this Kaggle dataset 'Recipes from Tasty'
1010
https://www.kaggle.com/datasets/zeeenb/recipes-from-tasty?select=ingredient_and_instructions.json,
1111
filters for the top 1,000 highest rated recipes, and turns them into basic
12-
dialogue using a preset list of user prompt tempaltes.
12+
dialogue using a preset list of user prompt templates.
1313

1414
### Some ideas for extending this dataset
1515

1616
This dataset is nicely structured, and the ingredients section includes the
1717
quantities and units separated out. Some, but not all already include a
1818
primary_unit (US) and metric_unit. We could find all recipes with both units and
1919
generate dialogue for the prompt 'convert the ingredients into metric', 'what
20-
are the ingredients in UK measurments'? etc..
20+
are the ingredients in UK measurements'? etc..

discord-bots/oa-bot-js/src/modules/open-assistant/langs.ts

+1-1
Original file line numberDiff line numberDiff line change
@@ -83,7 +83,7 @@ export const getLocaleDisplayName = (
8383
locale: string,
8484
displayLocale = undefined
8585
) => {
86-
// Intl defaults to English for locales that are not oficially translated
86+
// Intl defaults to English for locales that are not officially translated
8787
if (missingDisplayNamesForLocales[locale]) {
8888
return missingDisplayNamesForLocales[locale];
8989
}

docker/grafana/README.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
# Grafana
22

33
[Grafana](https://github.com/grafana/grafana) is used to visualize custom
4-
observabiltiy metrics and much more.
4+
observability metrics and much more.
55

66
This folder contains various configuration files for Grafana.
77

docker/netdata/README.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
[Netdata](https://github.com/netdata/netdata) is an open source monitoring tool.
44

5-
This folder contains some configfuration files used to set up various netdata
5+
This folder contains some configuration files used to set up various netdata
66
collectors we want to use like Redis, Postgres, etc.
77

88
- [`./go.d/postgres.conf`](./go.d/postgres.conf) - Config for Netdata

docker/prometheus/README.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
[Prometheus](https://github.com/prometheus/prometheus) is an open source
44
monitoring system.
55

6-
This folder contains some configfuration files used to set up Prometheus.
6+
This folder contains some configuration files used to set up Prometheus.
77

88
- [`./prometheus.yml`](./prometheus.yml) - Config for Prometheus, including what
99
`/metrics` endpoints to scrape.

docs/docs/tasks/label_prompter_reply.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
# Classifying an initial prompt or user reply
22

33
In this task, you'll be shown a random message written by another person. This
4-
message is mimicing a request or question directed towards the assistant - a
4+
message is mimicking a request or question directed towards the assistant - a
55
**prompt**. This prompt could either be a start of a conversation, or a reply to
66
a message from the assistant. Your job is to rate parameters like quality or
77
politeness, as well as include any applicable labels, such as spam, PII or

model/model_eval/manual/sampling_report.py

+2-2
Original file line numberDiff line numberDiff line change
@@ -243,9 +243,9 @@ def parse_args():
243243
"--prompts", type=str, help="jsonl string prompts input file name", default="./data/en_100_text.jsonl.gz"
244244
)
245245
parser.add_argument("--report", type=str, help="json sampling report output file name")
246-
parser.add_argument("--seed", type=int, default="42", help="psoudo random number generator seed")
246+
parser.add_argument("--seed", type=int, default="42", help="pseudo random number generator seed")
247247
parser.add_argument("--verbose", action="store_true", default=False)
248-
parser.add_argument("-n", type=int, help="number of promtps to use (default: all)")
248+
parser.add_argument("-n", type=int, help="number of prompts to use (default: all)")
249249
parser.add_argument("--num-samples", type=int, default=2, help="number of sampling runs per configuration")
250250
parser.add_argument("--config", type=str, default="config/default.json", help="configuration file path")
251251
parser.add_argument("--half", action="store_true", default=False, help="use float16")

model/model_training/configs/config_rl.yaml

+1-1
Original file line numberDiff line numberDiff line change
@@ -74,7 +74,7 @@ llama_rlhf:
7474
quantization: false
7575
seq2seqmodel: false
7676
freeze_layer: 52
77-
num_layers_unfrozen: -1 # we dont use this, trlx has its own implementation
77+
num_layers_unfrozen: -1 # we don't use this, trlx has its own implementation
7878
residual_dropout: 0.0
7979
use_flash_attention: true
8080
dtype: fp16

model/model_training/custom_datasets/qa_datasets.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -335,7 +335,7 @@ def __init__(self, cache_dir) -> None:
335335
data = json.loads(line)
336336
joke = data["joke"]
337337
# DO NOT change this
338-
# its the data that had syntax error
338+
# it's the data that had syntax error
339339
explanation = data["explaination"]
340340
self.pairs.append(create_dataset_entry_qa(mode="sft", questions=[joke], answers=[explanation]))
341341

notebooks/data-augmentation/anthropic/README.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ dataset = load_dataset("shahules786/prosocial_augmented")
1616
## Steps
1717

1818
1. Use prosocial dialog dataset to train a
19-
[safety label classifer](https://huggingface.co/shahules786/prosocial-classifier).
19+
[safety label classifier](https://huggingface.co/shahules786/prosocial-classifier).
2020
2. Finding Rules of thumbs(rots) present in prosocial dataset that matches
2121
task_description in red-teaming data.
2222
3. Use pretrained safety-classifier to predict safety labels for the selected

notebooks/data-augmentation/essay-instructions/README.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -8,4 +8,4 @@ collecting for the model
88

99
Feel free to contribute to this notebook, it's nowhere near perfect but it's a
1010
good start. If you want to contribute finding a new model that better suits this
11-
task would be great. Hugginface has a lot of models that could help.
11+
task would be great. Huggingface has a lot of models that could help.

notebooks/data-augmentation/wikidata-qa/README.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ answer pair is necessary!
99

1010
A step-by-step guide:
1111

12-
1. Create a WikiGraph crawler instance and define a cache file to avoide
12+
1. Create a WikiGraph crawler instance and define a cache file to avoid
1313
redownloading nodes (only English is supported at the moment)
1414

1515
```Python

notebooks/detoxify-evaluation/README.md

+6-6
Original file line numberDiff line numberDiff line change
@@ -43,13 +43,13 @@ Detoxify was tested on 4 different types of inputs
4343

4444
### Sentences used for testing and rating are contained inside the .ipynb
4545

46-
| Model name | Not obviously toxic | Not obviously non-toxic | Obviously toxic | Obviously non-toxic |
47-
| :----------: | :--------------------------------------------------------------------------------------------: | :---------------------------------------------------------------------: | :--------------------------------------------------------------: | :-----------------: |
48-
| original | failed at all, easily accepted racist, sexist overally toxic prompts that were well formulated | Very sensitive on swear words, failed to reckognize context | good performance | good performance |
49-
| unbiased | Managed to find some hidden toxicity but not on all sentences | Very sensitive explicit language but shown ability to recognize context | Did well but failed to reckognize some gender stereotype mockery | good performance |
50-
| multilingual | Managed to find some hidden toxicity but not on all sentences | Very sensitive explicit language but shown ability to recognize context | Did well but failed to reckognize some gender stereotype mockery | good performance |
46+
| Model name | Not obviously toxic | Not obviously non-toxic | Obviously toxic | Obviously non-toxic |
47+
| :----------: | :--------------------------------------------------------------------------------------------: | :---------------------------------------------------------------------: | :-------------------------------------------------------------: | :-----------------: |
48+
| original | failed at all, easily accepted racist, sexist overally toxic prompts that were well formulated | Very sensitive on swear words, failed to recognize context | good performance | good performance |
49+
| unbiased | Managed to find some hidden toxicity but not on all sentences | Very sensitive explicit language but shown ability to recognize context | Did well but failed to recognize some gender stereotype mockery | good performance |
50+
| multilingual | Managed to find some hidden toxicity but not on all sentences | Very sensitive explicit language but shown ability to recognize context | Did well but failed to recognize some gender stereotype mockery | good performance |
5151

52-
Subjectivly 'unbiased' looks like the best performing model.
52+
Subjectively 'unbiased' looks like the best performing model.
5353

5454
I don't think it would do well as a security layer in a live version of open
5555
assistant unless we do some finetuning first, because it can be fooled to pass

notebooks/example/README.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/LAION-AI/Open-Assistant/blob/main/notebooks/example/example.ipynb)
44

55
This folder contains an example reference notebook structure and approach for
6-
this project. Please try and follow this structure as closely as possible. While
6+
this project. Please try to follow this structure as closely as possible. While
77
things will not exactly be the same for each notebook some principles we would
88
like to try ensure are:
99

@@ -34,7 +34,7 @@ same directory that the notebook lives in so relative links etc should work as
3434
expected (for example `example.ipynb` will read some sample data from
3535
`data/data.csv`).
3636

37-
If you are adding a notebook please try and add a similar cell to the top of the
37+
If you are adding a notebook please try to add a similar cell to the top of the
3838
notebook so that it is easy for others to run the notebook in colab. If your
3939
notebook does not have any dependencies beyond what already comes as standard in
4040
Google Colab then you do not need such a cell, just an "Open in Colab" badge

oasst-data/README.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -145,7 +145,7 @@ messages are those which have a `review_result` that is `false`.
145145

146146
Conversation threads are a linear lists of messages. THese objects can be
147147
identified by the presence of the `"thread_id"` property which contains the UUID
148-
of the last messsage of the the thread (which can be used to reconstruct the
148+
of the last message of the the thread (which can be used to reconstruct the
149149
thread by returning the list of ancestor messages up to the prompt root
150150
message). The message_id of the first message is normally also the id of the
151151
message-tree that contains the thread.

oasst-shared/README.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,3 @@
1-
# Shared Python code for Open Assisstant
1+
# Shared Python code for Open Assistant
22

33
Run `pip install -e .` to install the package in editable mode.

scripts/data-collection/twitter/README.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -38,8 +38,8 @@ conversation, or at least as a prompt with replies.
3838
files for future processing. Note: Using polars instead of pandas due to
3939
performance reasons.
4040
- Wrote scripts that process the large dump of tweets into conversation threads
41-
using the tree and node architecture. This results in aroun 17K conversation
42-
threads bassed on a dump of 90M tweets.
41+
using the tree and node architecture. This results in around 17K conversation
42+
threads based on a dump of 90M tweets.
4343
- Script can output the conversation threads into a jsonl file for further
4444
filtering or use in models.
4545

scripts/postprocessing/scoring.py

+2-2
Original file line numberDiff line numberDiff line change
@@ -76,7 +76,7 @@ def score_update_votes(new_vote: int, consensus: npt.ArrayLike, voter_data: Vote
7676
consensus_ranking = np.argsort(np.argsort(consensus))
7777
new_points = consensus_ranking[new_vote] + voter_data.voting_points
7878

79-
# we need to correct for 0 indexing, if you are closer to "right" than "wrong" of the conensus,
79+
# we need to correct for 0 indexing, if you are closer to "right" than "wrong" of the consensus,
8080
# it's a good vote
8181
new_good_votes = int(consensus_ranking[new_vote] > (len(consensus) - 1) / 2) + voter_data.num_good_votes
8282
new_num_votes = voter_data.num_votes + 1
@@ -105,7 +105,7 @@ def score_update_prompts(consensus: npt.ArrayLike, voter_data: Voter) -> Voter:
105105
delta_votes = np.sum(consensus_ranking * consensus / sum(consensus))
106106
new_points = delta_votes + voter_data.prompt_points
107107

108-
# we need to correct for 0 indexing, if you are closer to "right" than "wrong" of the conensus,
108+
# we need to correct for 0 indexing, if you are closer to "right" than "wrong" of the consensus,
109109
# it's a good vote
110110
new_good_prompts = int(delta_votes > 0) + voter_data.num_good_prompts
111111
new_num_prompts = voter_data.num_prompts + 1

website/src/components/Chat/ChatListItem.tsx

+1-1
Original file line numberDiff line numberDiff line change
@@ -75,7 +75,7 @@ export const ChatListItem = ({
7575

7676
return (
7777
<Button
78-
// @ts-expect-error error due to dynamicly changing as prop
78+
// @ts-expect-error error due to dynamically changing as prop
7979
ref={rootRef}
8080
{...(!isEditing ? { as: Link, href: ROUTES.CHAT(chat.id) } : { as: "div" })}
8181
variant={isActive ? "solid" : "ghost"}

website/src/components/Messages/MessageTableEntry.tsx

+2-2
Original file line numberDiff line numberDiff line change
@@ -165,12 +165,12 @@ export const MessageTableEntry = forwardRef<HTMLDivElement, MessageTableEntryPro
165165
)}
166166
{message.deleted && isAdminOrMod && (
167167
<Badge colorScheme="red" textTransform="capitalize">
168-
Deleted {/* dont translate, it's admin only feature */}
168+
Deleted {/* don't translate, it's admin only feature */}
169169
</Badge>
170170
)}
171171
{message.review_result === false && isAdminOrMod && (
172172
<Badge colorScheme="yellow" textTransform="capitalize">
173-
Spam {/* dont translate, it's admin only feature */}
173+
Spam {/* don't translate, it's admin only feature */}
174174
</Badge>
175175
)}
176176
</Flex>

website/src/lib/oasst_api_client.ts

+1-1
Original file line numberDiff line numberDiff line change
@@ -420,7 +420,7 @@ export class OasstApiClient {
420420
return this.get<BackendUser>(`/api/v1/frontend_users/${user.auth_method}/${user.id}`);
421421
}
422422

423-
// TODO: add update-able fields eg: enbaled, notes, show_on_leaderboard, etc..
423+
// TODO: add update-able fields eg: enabled, notes, show_on_leaderboard, etc..
424424
upsert_frontend_user(user: BackendUserCore) {
425425
// the backend does a upsert operation with this call
426426
return this.post<BackendUser>(`/api/v1/frontend_users/`, user);

website/src/lib/users.ts

+1-1
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ export const getBackendUserCore = async (id: string): Promise<BackendUserCore> =
2323
};
2424

2525
/**
26-
* convert a user object to a canoncial representation used for interacting with the backend
26+
* convert a user object to a canonical representation used for interacting with the backend
2727
* @param user frontend user object, from prisma db
2828
*/
2929
export const convertToBackendUserCore = <T extends { accounts: Account[]; id: string; name: string }>(

website/src/pages/api/auth/[...nextauth].ts

+1-1
Original file line numberDiff line numberDiff line change
@@ -139,7 +139,7 @@ const authOptions: AuthOptions = {
139139
async signIn({ user, account, isNewUser }) {
140140
if (isNewUser && account.provider === "email" && !user.name) {
141141
// only generate a username if the user is new and they signed up with email and they don't have a name
142-
// although the name already assigned in the jwt callback, this is to ensure notthing breaks, and we should never reach here.
142+
// although the name already assigned in the jwt callback, this is to ensure nothing breaks, and we should never reach here.
143143
await prisma.user.update({
144144
data: {
145145
name: generateUsername(),

0 commit comments

Comments
 (0)