-
Notifications
You must be signed in to change notification settings - Fork 1.9k
ATM: Simplify query configurations #11323
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
...imental/adaptivethreatmodeling/lib/experimental/adaptivethreatmodeling/NosqlInjectionATM.qll
Show resolved
Hide resolved
e6c76a5 to
9a37061
Compare
a543d95 to
d8ef142
Compare
9a37061 to
fac6641
Compare
d8ef142 to
42cc1bc
Compare
|
@kaeluka when you review this PR, do you mind looking at the possible issue Henry raised here. I think it's OK, but I have limited understanding of the issue he's pointing out. Henry said |
|
Thank you for point this out! I was going to, but I definitely want to talk to Henry about it as well! |
|
As part of digging into the questions brought up by Henry, I added the following snippet to each file modified in this PR and quick-evaluated it: predicate test(DataFlow::Configuration a) {
any()
}This showed me that there's actually a few files where there's several instances of These files are:
@henrymercer, is my reasoning sound? |
Your reasoning around the files you listed looks good. Most of those files are test or debug files. I'm not sure how you handled the case of libraries being changed, as we'd need to check all the files that use this library and not just the ones being changed. I did try evaluating the predicate top-down on each of the queries and they each reported a single dataflow configuration, so I think we're good to go. |
|
Perfect! I'll continue with the conventional part of the review :) This should be done ~tomorrow lunchtime, @tiferet. Only minor comments so far, but will take some time to think about the big picture tomorrow. |
fac6641 to
4580b55
Compare
|
@kaeluka Since you're already in the middle of reviewing this PR, I don't want to rebase it on top of the latest version of
|
kaeluka
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The big question, whether this might introduce a pathological dependency graph, here has been tackled in the PR discussion already.
👍: The rest of the review is that I:
- Like the change and am not asking you to make any big changes.
- I'd slightly prefer to follow convention in the naming of the Config classes but it's not a hill I'll die on.
- I was a bit surprised to see the PR, as it seems like it's a bit of a side track from the grand design we're following. Happy to see it either way, though!
...ql/experimental/adaptivethreatmodeling/lib/experimental/adaptivethreatmodeling/ATMConfig.qll
Outdated
Show resolved
Hide resolved
...imental/adaptivethreatmodeling/lib/experimental/adaptivethreatmodeling/NosqlInjectionATM.qll
Outdated
Show resolved
Hide resolved
...imental/adaptivethreatmodeling/lib/experimental/adaptivethreatmodeling/NosqlInjectionATM.qll
Show resolved
Hide resolved
...erimental/adaptivethreatmodeling/lib/experimental/adaptivethreatmodeling/SqlInjectionATM.qll
Outdated
Show resolved
Hide resolved
javascript/ql/experimental/adaptivethreatmodeling/src/NosqlInjectionATM.ql
Show resolved
Hide resolved
javascript/ql/experimental/adaptivethreatmodeling/test/endpoint_large_scale/EndpointFeatures.ql
Outdated
Show resolved
Hide resolved
All remaining functionality in `CoreKnowledge` is only being used in `EndpointCharacteristics`, so it can be moved there as a small set of helper predicates.
All remaining functionality in `StandardEndpointFilters` is only being used in `EndpointCharacteristics`, so it can be moved there as a small set of helper predicates.
That way the specific configs which inherit from `AtmConfig` also inherit from `TaintTracking::Configuration`. This removes the need for two separate config classes for each query.
A long as we're not boosting sources, `isSource` is identical to `isKnownSource`.
Holds if `sink` is a known taint sink or an "effective" sink.
Define the query for finding ATM alerts in the base class `AtmConfig`, and call it from each query's .ql file.
6813358 to
6f807e9
Compare
Name the query configuration e.g. `NosqlInjectionATMConfig` rather than `Configuration`.
Yes, I think we're already done implementing the design. All the issues left here are improvements we can make either to the code or to the data now that the refactoring is complete 🎉 |
| * to this ML-boosted configuration, whereas the unboosted base query does not contain this source and sink | ||
| * combination. | ||
| */ | ||
| predicate getAlerts(JS::DataFlow::PathNode source, JS::DataFlow::PathNode sink, float score) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor/bikeshed: I'd prefer a name that doesn't suggest a return/result value, and mentions flow. Also makes it easier to see the parallels between unboosted and boosted queries. How about hasLikelyFlowPath?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know about hasLikelyFlowPath, because one of the conditions it must meet is hasFlowPath. Logically hasFlowPath should imply hasLikelyFlowPath.
Maybe something like hasBoostedFlowPath? We've been trying to get rid of the "boosted" terminology, but I can't think of something better right now. I think hasAlert would be clearer (ultimately that's what this predicate does -- it surfaces all the alerts for this config), but that doesn't mention flow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: I tracked this renaming together with the other renamings here, so that we can merge this PR and XssThroughDOM and open a separate PR for all the renamings
This PR simplifies the query configurations. It moves common functionality from the derived classes up to
AtmConfig. It also combines the two configs each query previously had into a single config, simplifying the code and making it easier to understand. This will prevent code duplication and bugs, improve maintainability, and make it easier to add new queries.Once again commit-by-commit review is recommended 😄
Main changes
AtmConfiginherits fromTaintTracking::Configuration. That way the specific query configs (e.g.DomBasedXssAtmConfig), which inherit fromAtmConfig, also inherit fromTaintTracking::Configuration. This removes the need for two separate config classes for each query (e.g.DomBasedXssAtmConfigandXssATM::Configuration). Note: We could also keepAtmConfigas is and have the query configs inherit both from it and fromTaintTracking::Configurationdirectly, but I think havingAtmConfiginherit fromTaintTracking::Configurationmakes more sense: it's a query configuration for a boosted taint tracking query, which is a type of taint tracking query that adds a few extra predicates (e.g.isKnownSink).isSourceto the base class: A long as we're not boosting sources,isSourceis identical toisKnownSource.isSinkto the base class: Holds ifsinkis a known taint sink or an "effective" sink (a candidate to be classified by an ML model).AtmConfig, and call it from each query's .ql file.Consistency check
This DCA experiment uses the model on
mainwith the final commit of each PR from the CodeQL revamp. The idea is to double-check that none of the changes I've made has affected the endpoints that get scored at inference time, since this is not directly tested for in the PR checks. We see identical results from all variants (e.g. this), confirming that none of the revamp work has changed the endpoints being scored.Timing checks
endpoint_large_scale/ExtractEndpointDataTrainingremains like it was after the last PR that affected timing: About 5s.Addresses most of https://github.com/github/ml-ql-adaptive-threat-modeling/issues/2134.