-
Notifications
You must be signed in to change notification settings - Fork 10
Example Galaxy Workflowrun ro-crate #44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The {
"@id": "./",
"@type": "Dataset",
"mentions": {"@id": "#b91b07ec-5752-465d-a0c4-912e0312abc0"},
...
} |
The |
{ | ||
"@id": "#lineNum-param", | ||
"@type": "FormalParameter", | ||
"additionalType": "None", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be "Integer". If it's not straightforward to convert from the Galaxy type, better not add the "additionalType" property at all ("None" is not in the RO-Crate context).
{ | ||
"@id": "#advanced-param", | ||
"@type": "FormalParameter", | ||
"additionalType": "None", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably be "Text" (looking at the value in the corresponding pv). If it's not straightforward to convert from the Galaxy type, better not add the "additionalType" property at all ("None" is not in the RO-Crate context).
"valueRequired": true | ||
}, | ||
{ | ||
"@id": "urn:uuid:eabc423a-227e-4096-8e14-74d0088c8ef9", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to be changed to "#eabc423a-227e-4096-8e14-74d0088c8ef9", same for other IDs like this one
Should also expand "@id": "./",
"@type": "Dataset",
"conformsTo": [
{"@id": "https://w3id.org/ro/wfrun/process/0.1"},
{"@id": "https://w3id.org/ro/wfrun/workflow/0.1"},
{"@id": "https://w3id.org/workflowhub/workflow-ro-crate/1.0"}
], and their static contextual entities: { "@id": "https://w3id.org/ro/wfrun/process/0.1",
"@type": "CreativeWork",
"name": "Process Run Crate",
"version": "0.1"
},
{ "@id": "https://w3id.org/ro/wfrun/workflow/0.1",
"@type": "CreativeWork",
"name": "Workflow Run Crate",
"version": "0.1"
},
{ "@id": "https://w3id.org/workflowhub/workflow-ro-crate/1.0",
"@type": "CreativeWork",
"name": "Workflow RO-Crate",
"version": "1.0"
}, |
The various They seem to be JSON files, but have the |
"additionalType": "None", | ||
"description": "select 3 lines", | ||
"name": "select lines parameter", | ||
"valueRequired": true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be "valueRequired": "True"
, i.e., pointing to True (same for other occurrences)
I'm not sure if there's a way to reply with the updated output comment by comment? |
The changes look good! However, there's another problem I hadn't noticed before: the representation of inputs and outputs does not match the workflow's structure. The workflow takes two collections and merges them into a single one, then concatenates the datasets from the merged collection into a single dataset and finally selects some lines from the concatenated dataset. The input parameters of the workflow should be the two input collections and the parameter that controls the number of lines for the final selection (plus the advanced parameter, which seems to control the handling of merge conflicts), while the output should be the file containing the selected lines. The current metadata file has individual files from the collections as inputs instead; also, inputs include the concatenated dataset, which is an intermediate output. The workflow's {
...
"input": [
{"@id": "#lineNum-param"},
{"@id": "#advanced-param"},
{"@id": "#collection1-param"},
{"@id": "#collection2-param"}
],
"output": [
{"@id": "#4a0f4078-5aff-4e02-9f9c-4ad510050e54"}
],
...
},
{
"@id": "#collection1-param",
"@type": "FormalParameter",
"additionalType": "Collection",
"name": "collection 1"
},
{
"@id": "#collection2-param",
"@type": "FormalParameter",
"additionalType": "Collection",
"name": "collection 2"
} The action's {
...
"object": [
{"@id": "#lineNum-pv"},
{"@id": "#advanced-pv"},
{"@id": "#dataset_collection-11"},
{"@id": "#dataset_collection-10"}
],
"result": [
{"@id": "datasets/Select_first_on_data_48_49.txt"}
],
...
} with That was the main thing. I've also found two minor issues:
I have pushed here the full expected metadata file with all the changes. |
Thanks Simone, that clears up some of the details of the format. I'll continue with it. |
I've made further updates to the code addressing the required changes. Unfortunately the diff with the previous version is a bit difficult to make since I moved around some parts. A few things to note:
|
OK. If they are workflow outputs for Galaxy, they have to be listed as workflow outputs in the RO-Crate as well. Regarding the "advanced" parameter, if it's not enabled as a workflow parameter then it should not be included in the RO-Crate. However, I have some comments regarding its representation, which would become relevant in those cases where such parameters would have to be included. In the current version of the example, the {
"@id": "#advanced-pv",
"@type": "PropertyValue",
"exampleOfWork": {"@id": "#advanced-param"},
"name": "merge collections tool PropertyValue",
"value": {
"conflict": {
"__current_case__": 0,
"duplicate_options": "suffix_conflict",
"suffix_pattern": "_#"
}
}
} I.e., the value has been inserted as JSON and merged with the overall JSON structure, making the RO-Crate invalid. In the previous version, instead, the value was inserted as a string, which is OK: "value": "{\"conflict\": {\"__current_case__\": 0, \"duplicate_options\": \"suffix_conflict\", \"suffix_pattern\": \"_#\"}}" Also, I think that "advanced" refers to the whole set of "hidden" parameters in the Galaxy interface, and there could be more than one. So the parameter should actually be called "conflict", leading to something like: {
"@id": "#conflict-pv",
"@type": "PropertyValue",
"exampleOfWork": {"@id": "#conflict-param"},
"name": "conflict",
"value": "{\"__current_case__\": 0, \"duplicate_options\": \"suffix_conflict\", \"suffix_pattern\": \"_#\"}"
},
{
"@id": "#conflict-param",
"@type": "FormalParameter",
"additionalType": "Text",
"name": "conflict",
"valueRequired": "False"
}, But, again, in this specific case the parameter should not be included. Here's a list of issues I've found in the current version of the example:
I've pushed the expected metadata file according to the above changes here. |
I agree with all changes, the only one I have doubts about is this:
This is intended since the two input collections reference the same input dataset and the input datasets use the filename as the id:
and
|
Yes, but when you merge the collections only one of the datasets with a repeated name is included in the merged collection. This is explained here (in the "Merge collections" subsection). From the RO-Crate metadata file's standpoint it's the same, duplicate entries do not make sense: though multiple values are represented as JSON lists, their JSON-LD semantics is basically that of sets. |
and which parameters to include
Updated the code base to address the required changes. About the merged collections, in the example workflow the collections are merged using the advanced parameter that handles conflicts (see the screenshot bellow). So there are two references to one dataset but the two elements of the collection do receive a unique "element identifier". Should this be expressed somehow in the ro-crate metadata? |
Since the selected conflict handler appends suffixes to conflicted element identifiers, the same can be done in the RO-Crate: the generator can add two copies of "hasPart": [
{
"@id": "datasets/hello_33_1.txt"
},
{
"@id": "datasets/world_34.txt"
},
{
"@id": "datasets/hello_33_2.txt"
},
{
"@id": "datasets/universe_31.txt"
}
] |
An alternative is to slightly change the example workflow to use the default conflict resolution, which keeps only one copy. |
I've changed the conflict resolution parameter for merging collections to the default, keep first. |
Looks good! Merging. For the Zenodo upload, please use a zip file: if you do that, Zenodo recognizes the format and generates a summary of contained files to display in the record's page. If you can, use the |
Example Galaxy Workflowrun ro-crate which includes Galaxy features: collections and parameters