Skip to content

Commit 27047b2

Browse files
authoredFeb 19, 2024
feat: allow dataset names to start with numbers (#70)
1 parent 29589af commit 27047b2

File tree

4 files changed

+14
-10
lines changed

4 files changed

+14
-10
lines changed
 

‎Project.toml

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
name = "DataSets"
22
uuid = "c9661210-8a83-48f0-b833-72e62abce419"
33
authors = ["Chris Foster <chris42f@gmail.com> and contributors"]
4-
version = "0.2.10"
4+
version = "0.2.11"
55

66
[deps]
77
AbstractTrees = "1520ce14-60c1-5f80-bbc7-55ef81b5835c"

‎docs/src/design.md

+1-2
Original file line numberDiff line numberDiff line change
@@ -93,7 +93,7 @@ names to `DataSet`s. Perhaps it also maintains the serialized `DataSet`
9393
information as well for those datasets which are not registered. It might be
9494
stored in a Data.toml, in analogy to Project.toml.
9595

96-
Maintaince of the data project should occur via a data REPL.
96+
Maintenance of the data project should occur via a data REPL.
9797

9898
## Data Registries
9999

@@ -277,4 +277,3 @@ array of strings)
277277
is restricted to tabular data, but seems similar in spirit to DataSets.jl.
278278
* [FileTrees.jl](http://shashi.biz/FileTrees.jl) provides tools for
279279
representing and processing tree-structured data lazily and in parallel.
280-

‎src/DataSets.jl

+9-6
Original file line numberDiff line numberDiff line change
@@ -84,14 +84,17 @@ end
8484
"""
8585
check_dataset_name(name)
8686
87-
Check whether a dataset name is valid. Valid names include start with a letter
88-
and may contain letters, numbers or `_`. Names may be hieracicial, with pieces
89-
separated with forward slashes. Examples:
87+
Check whether a dataset name is valid.
88+
89+
Valid names must start with a letter or a number, the rest of the name can also contain `-`
90+
and `_` characters. The names can also be hieracicial, with segments separated by forward
91+
slashes (`/`). Each segment must also start with either a letter or a number. For example:
9092
9193
my_data
9294
my_data_1
9395
username/data
94-
organization-dataset_name/project/data
96+
organization_name/project-name/data
97+
123user/456dataset--name
9598
"""
9699
function check_dataset_name(name::AbstractString)
97100
if !occursin(DATASET_NAME_REGEX, name)
@@ -101,10 +104,10 @@ end
101104
# DataSet names disallow most punctuation for now, as it may be needed as
102105
# delimiters in data-related syntax (eg, for the data REPL).
103106
const DATASET_NAME_REGEX_STRING = raw"""
104-
[[:alpha:]]
107+
[[:alnum:]]
105108
(?:
106109
[-[:alnum:]_] |
107-
/ (?=[[:alpha:]])
110+
/ (?=[[:alnum:]])
108111
)*
109112
"""
110113
const DATASET_NAME_REGEX = Regex("^\n$(DATASET_NAME_REGEX_STRING)\n\$", "x")

‎test/runtests.jl

+3-1
Original file line numberDiff line numberDiff line change
@@ -101,13 +101,15 @@ end
101101
@testset "Data set name parsing" begin
102102
@testset "Valid name: $name" for name in (
103103
"a_b", "a-b", "a1", "δεδομένα", "a/b", "a/b/c", "a-", "b_",
104+
"1", "a/1", "123", "12ab/34cd", "1/2/3", "1-2-3", "x_-__", "a---",
104105
)
105106
@test DataSets.check_dataset_name(name) === nothing
106107
@test DataSets._split_dataspec(name) == (name, nothing, nothing)
107108
end
108109

109110
@testset "Invalid name: $name" for name in (
110-
"1", "a b", "a.b", "a/b/", "a//b", "/a/b", "a/-", "a/1", "a/ _/b"
111+
"a b", "a.b", "a/b/", "a//b", "/a/b", "a/-", "a/ _/b",
112+
"a/-a", "a/-1",
111113
)
112114
@test_throws ErrorException DataSets.check_dataset_name(name)
113115
@test DataSets._split_dataspec(name) == (nothing, nothing, nothing)

0 commit comments

Comments
 (0)