Skip to content

Conversation

@tiferet
Copy link
Contributor

@tiferet tiferet commented Dec 1, 2022

Adds a test to detect changes in the endpoints that get scored at inference time.

Note that the queries' .ql files (e.g. src/NosqlInjectionATM.ql) can't be called directly here, because they use the model and compute a score.

Closes https://github.com/github/ml-ql-adaptive-threat-modeling/issues/2135

@tiferet tiferet requested review from a team and kaeluka and removed request for a team December 1, 2022 22:26
@github-actions github-actions bot added the ATM label Dec 1, 2022
jhelie
jhelie previously approved these changes Dec 2, 2022
Copy link
Contributor

@jhelie jhelie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @tiferet, LGTM. I've added XssThroughDom but this didn't require updating the .expected files : I guess our some small examples do not contain instances of source for Xss and not for XssThroughDom (or vice versa).

@jhelie
Copy link
Contributor

jhelie commented Dec 2, 2022

ah I see we were both working on the PR: I can push my change or you can update the ExtractEndpointDataInference.ql file yourself after rebasing on main.

(I'm de-approving in case you are still working on this and I'll refrain pushing my branch unless you tell me to do so)

@jhelie jhelie dismissed their stale review December 2, 2022 14:30

changes since review

@tiferet
Copy link
Contributor Author

tiferet commented Dec 2, 2022

Thanks @tiferet, LGTM. I've added XssThroughDom but this didn't require updating the .expected files : I guess our some small examples do not contain instances of source for Xss and not for XssThroughDom (or vice versa).

I don't see any commits from you, but don't worry about it -- I'll make the needed 1-line change and push it 😄

@tiferet tiferet force-pushed the tiferet/endpoint-filter-test branch from 4cf32b8 to f1f356f Compare December 2, 2022 14:54
@tiferet
Copy link
Contributor Author

tiferet commented Dec 2, 2022

@kaeluka / @henrymercer Does it make sense that there are no XssThroughDom sink candidates with flow from a source in endpoint_large_scale? Was that set created specifically for our existing four queries? If so, do we need to add to it each time we boost a new query?

Adds a test to detect changes in the endpoints that get scored at inference time.
Not strictly needed, but better to keep things private when possible
Oops, now I see why that wasn't private
@tiferet tiferet force-pushed the tiferet/endpoint-filter-test branch from f1f356f to d17383d Compare December 2, 2022 14:59
Copy link

@kaeluka kaeluka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but I think we don't need to rely on the heavier PathNode class here. Node suffices.

Also suggested a name change to explicitly mention that the predicate only returns endpoints WITH FLOW.

private import experimental.adaptivethreatmodeling.XssThroughDomATM as XssThroughDomAtm

query predicate isSinkCandidateForQuery(
AtmConfig::AtmConfig queryConfig, JS::DataFlow::PathNode sink
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
AtmConfig::AtmConfig queryConfig, JS::DataFlow::PathNode sink
AtmConfig::AtmConfig queryConfig, JS::DataFlow::Node sink

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to keep the test as similar as possible to the actual extraction queries. The extraction queries use DataFlow::PathNode (e.g. javascript/ql/experimental/adaptivethreatmodeling/src/SqlInjectionATM.ql). I don't know if there's a reason they do this or not, but if we want to change those to DataFlow::Node (in which case we can change these as well), we should do so in a separate PR.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The extraction queries use the PathNodes for a specific reason - namely, that the UI should be able to list a specific path from source to sink. This is not needed in this test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comment here

@kaeluka
Copy link

kaeluka commented Dec 2, 2022

Does it make sense that there are no XssThroughDom sink candidates with flow from a source in endpoint_large_scale?

No, IMO

Was that set created specifically for our existing four queries?

No, IMO (but I wasn't there)

If so, do we need to add to it each time we boost a new query?

I think so, yes.

@jhelie
Copy link
Contributor

jhelie commented Dec 2, 2022

Please update the issue template if we need to consider updating endpoint_large_scale when adding a new query.

@owen-mc owen-mc changed the title Test for endpoints scored at inference time ATM: Test for endpoints scored at inference time Dec 2, 2022
Co-authored-by: Stephan Brandauer <kaeluka@github.com>
@tiferet tiferet requested a review from kaeluka December 2, 2022 17:01
@tiferet
Copy link
Contributor Author

tiferet commented Dec 2, 2022

Please update the issue template if we need to consider updating endpoint_large_scale when adding a new query.

@jhelie I added a line about it here.

@henrymercer
Copy link
Contributor

henrymercer commented Dec 2, 2022

@kaeluka / @henrymercer Does it make sense that there are no XssThroughDom sink candidates with flow from a source in endpoint_large_scale? Was that set created specifically for our existing four queries? If so, do we need to add to it each time we boost a new query?

Stephan is correct, and I'll add some more context. See https://github.com/github/codeql/tree/main/javascript/ql/experimental/adaptivethreatmodeling/test/endpoint_large_scale/autogenerated for a description of how endpoint_large_scale is generated. Ideally, the files copied from javascript/ql/test/query-tests/Security/CWE-079 would contain some sink candidates with flow for XSS through DOM, but it looks like the model isn't finding anything new there.

The test set was not created specifically for the existing four queries, but in general we will need to check it whenever we boost a new query to ensure it covers the new query.

@tiferet
Copy link
Contributor Author

tiferet commented Dec 2, 2022

@kaeluka / @henrymercer Does it make sense that there are no XssThroughDom sink candidates with flow from a source in endpoint_large_scale? Was that set created specifically for our existing four queries? If so, do we need to add to it each time we boost a new query?

Stephan is correct, and I'll add some more context. See https://github.com/github/codeql/tree/main/javascript/ql/experimental/adaptivethreatmodeling/test/endpoint_large_scale/autogenerated for a description of how endpoint_large_scale is generated. Ideally, the files copied from javascript/ql/test/query-tests/Security/CWE-079 would contain some sink candidates with flow for XSS through DOM, but it looks like the model isn't finding anything new there.

The test set was not created specifically for the existing four queries, but in general we will need to check it whenever we boost a new query to ensure it covers the new query.

Thanks! @jhelie I linked this answer in the issue template as well, for when you get back to the XssThroughDom work.

Copy link

@kaeluka kaeluka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR LGTM, although I'm a bit unclear whether the discussion between @tiferet and Jean has been resolved already.

  • I'm approving this, assuming that all things related to the conversation will be resolved in a different PR. This is how I understood the conversation.
  • The discussion about PathNodes is not worth losing time over (but it somewhat has bumped the urgency with which I want to look into that class's implementation ^^)

Also: thanks, @henrymercer for weighing in! I never had to look "inside" those tests before, so I appreciate the background info. I actually thought they were hand-crafted for the individual queries.

@tiferet
Copy link
Contributor Author

tiferet commented Dec 2, 2022

The PR LGTM, although I'm a bit unclear whether the discussion between @tiferet and Jean has been resolved already.

That was actually a conversation about the addition of XssThroughDom, that ended up here just because this PR revealed that our test set lacks XssThroughDom examples 😄. That's part of the XssThroughDom work, though, unrelated to this PR.

@tiferet tiferet merged commit 79d8444 into main Dec 2, 2022
@tiferet tiferet deleted the tiferet/endpoint-filter-test branch December 2, 2022 21:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants