Skip to content

Commit 0d4adb5

Browse files
Add Isna Persian Dataset (LAION-AI#3631)
The level of importance of this data is less than Wikipedia. So, I think [this pull request](LAION-AI#3629) should be merged first. I have uploaded the data to [huggingface](https://huggingface.co/datasets/pourmand1376/isna-news) according to Open-assistant's standard. So, it shouldn't need any processing. --------- Co-authored-by: Oliver Stanley <olivergestanley@gmail.com>
1 parent 65f5c2b commit 0d4adb5

File tree

2 files changed

+3
-0
lines changed

2 files changed

+3
-0
lines changed

data/datasets/__init__.py

+1
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@
44
"tv_dialogue": "sedthh/tv_dialogue", # TV and Movie dialogues and transcripts
55
"fd_dialogue": "sedthh/fd_dialogue", # TV and Movie dialogues and transcripts from ForeverDreaming
66
"tlcv2.0_oa": "pythainlp/tlcv2.0_oa", # Thai classical literature texts
7+
"fa-isna-news": "pourmand1376/isna-news", # Isna Persian News
78
"fa-wikipedia": "pourmand1376/fa-wikipedia", # Farsi Wikipedia texts
89
}
910

data/datasets/fa-isna-news/README.md

+2
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
This text-only dataset is crawled from [Isna news](https://isna.ir/). This is
2+
biggest farsi news agency and thus the text is pretty clean.

0 commit comments

Comments
 (0)