Skip to content

Commit 2b83d5d

Browse files
authored
Add brief documentation for backend data export process (LAION-AI#2068)
Close LAION-AI#2026 This aims to address a couple of questions/issues we've had around how to export data. The list at the end of the section can be expanded as more questions are asked.
1 parent 6a7aab7 commit 2b83d5d

File tree

2 files changed

+26
-1
lines changed

2 files changed

+26
-1
lines changed

backend/README.md

+25
Original file line numberDiff line numberDiff line change
@@ -70,3 +70,28 @@ wget localhost:8080/api/v1/openapi.json -O docs/docs/api/openapi.json
7070

7171
Note: The api docs should be automatically updated by the
7272
`test-api-contract.yaml` workflow.
73+
74+
## Exporting Data
75+
76+
When you have collected some data in the backend database, you can export it
77+
using the `export.py` script provided in this directory. This can be run from
78+
the command line using an Python environment with the same requirements as the
79+
backend itself. The script connects to the database in the same manner as the
80+
backend and therefore uses the same environmental variables.
81+
82+
A simple usage of the script, to export all English trees which successfully
83+
passed the review process, may look like:
84+
85+
```bash
86+
python export.py --lang en --export-file output.jsonl
87+
```
88+
89+
There are many options available to filter the data which can be found in the
90+
help message of the script: `python export.py --help`.
91+
92+
**Why isn't my export working?**
93+
94+
Common issues include (WIP):
95+
96+
- The messages have not passed the review process yet so the trees are not ready
97+
for export. This can be solved by including the `--include-spam` flag.

backend/export.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -336,7 +336,7 @@ def parse_args():
336336
parser.add_argument(
337337
"--include-spam",
338338
action="store_true",
339-
help="Export including messages with negative review result.",
339+
help="Export including messages with no review or negative review result.",
340340
)
341341
parser.add_argument(
342342
"--spam-only",

0 commit comments

Comments
 (0)