Skip to content

Conversation

@tromai
Copy link
Contributor

@tromai tromai commented Jul 2, 2025

Summary

This Pull Request adds a new command called gen-build-spec.

usage: macaron gen-build-spec [-h] -purl PACKAGE_URL --database DATABASE
                              [--output-format OUTPUT_FORMAT]

options:
  -h, --help            show this help message and exit
  -purl PACKAGE_URL, --package-url PACKAGE_URL
                        The PURL string of the software component to generate build
                        spec for.
  --database DATABASE   Path to the database.
  --output-format OUTPUT_FORMAT
                        The output format. Can be rc-buildspec (Reproducible-central
                        build spec) (default "rc-buildspec")

This command generates a buildspec, which contains the build related information for a PURL that Macaron has analyzed. The output file will be stored within output/<purl_based_path>/macaron.buildspec, where <purl_based_path> being the directory structure according to the input PackageURL.

An example

macaron analyze -purl pkg:maven/org.apache.hugegraph/computer-k8s@1.0.0
macaron gen-build-spec -purl pkg:maven/org.apache.hugegraph/computer-k8s@1.0.0 --database output/macaron.db

In this example, the final path to the output buildspec file is output/maven/org_apache_hugegraph/computer-k8s/macaron.buildspec

The content of output/maven/org_apache_hugegraph/computer-k8s/macaron.buildspec
, which uses the Reproducible Central buildspec format.

# Copyright (c) 2025, Oracle and/or its affiliates.
# Licensed under the Universal Permissive License v 1.0 as shown at https://oss.oracle.com/licenses/upl/.
# Generated by Macaron version 0.15.0

# Input PURL - pkg:maven/org.apache.hugegraph/computer-k8s@1.0.0
# Initial default JDK version 8 and default build command [['mvn', '-DskipTests=true', '-Dmaven.test.skip=true', '-Dmaven.site.skip=true', '-Drat.skip=true', '-Dmaven.javadoc.skip=true', 'clean', 'package']].

groupId=org.apache.hugegraph
artifactId=computer-k8s
version=1.0.0

gitRepo=https://github.com/apache/hugegraph-computer

gitTag=d2b95262091d6572cc12dcda57d89f9cd44ac88b

tool=mvn
jdk=8

newline=lf

command="mvn -DskipTests=true -Dmaven.test.skip=true -Dmaven.site.skip=true -Drat.skip=true -Dmaven.javadoc.skip=true clean package"

buildinfo=target/computer-k8s-1.0.0.buildinfo

This Buildspec ideally can be used directly as part of the Reproducible Central rebuild infrastructure.

Description of changes

Macaron database extractor

The first step to generate a buildspec is to extract the build related information from an existing Macaron SQLite database. The module macaron_db_extractor.py added in this commit does just that.

It uses sqlalchemy SELECT statement for ORM Mapped Classes to extract the data from the database into equivalent ORM Mapped instances that we defined in src/macaron/database/table_definitions.py for example.

Maven and Gradle CLI Command Parser

We use the build commands obtained in CI/CD configuration (e.g. from github action workflow yaml file) for the final buildspec. However, those build commands cannot be used as is and they requires some additional patching to work as a rebuild command.

A proper way to patch any maven and gradle CLI build command is to first parse is. The maven and gradle CLI command parsers added in this commit leverage Python's builtin argparse library.

CLI Build Command Patcher

The modules added in this commit uses the Maven and GRadle CLI Command Parser to parse and patch the build commands obtained from the Macaron database.

Jdk version finding from java Maven Central artifacts

Macaron can obtain the JDK version for a given build command obtained from CI/CD configuration. In some cases, the CI/CD configuration doesn't have enough information for us to obtain the JDK version. Therefore, we also rely on the JDK version included in META-INF/MANIFEST.MF in java artifacts from Maven Central https://repo1.maven.org/.

The module jdk_finder.py added in this commit help download the java artifacts from Maven Central given a maven type PURL, then returns the JDK version if it is available in META-INF/MANIFEST.MF.

In some cases, the JDK version string from META-INF/MANIFEST.MF don't only contain the JDK major version. For example:

Because Reproducible Central Buildspec requires only the major version of JDK, we need to extract that major version only. The jdk_version_normalizer.py module contains the logic to do just that. It is added this in commit.

Generating the Reproducible Central Buildspec

The two commits

use all components listed above to generate the final Reproducible Central Buildspec

Testing

This feature includes unit tests for all components used in RC Buildspec generation (e.g. CLI parsers, CLI patchers, JDK version finder, etc.)

For integration tests, a new script called compare_rc_build_spec.py is added to compare the result Buildspec in the integration tests.

Checklist

  • I have reviewed the contribution guide.
  • My PR title and commits follow the Conventional Commits convention.
  • My commits include the "Signed-off-by" line.
  • I have signed my commits following the instructions provided by GitHub. Note that we run GitHub's commit verification tool to check the commit signatures. A green verified label should appear next to all of your commits on GitHub.
  • I have updated the relevant documentation, if applicable.
  • I have tested my changes and verified they work as expected.
  • Come up with proper patching values for Reproducible Central format. Right now it's empty.
  • Added integration tests for the feature. Make sure to test on Docker image.
  • Added unit test for the reproducible central build spec generation format
  • Revise all remaining TODOs in the PR
  • Rebase the branch to properly split the changes into different commit, making it easier for review.
  • Update the api docs in the Sphinx documentation (could be in a different PR)
  • Add a tutorial on how to support generating build spec of a different format (e.g. support new build tool patching, etc.) (could also be in a different PR)
  • Generate the build spec to a PURL based path instead of output/macaron.buildspec (could also be in a different PR).
  • Add a test case for prioritizing JDK version obtained from Maven Central JAR than the one from look up build command
  • Consider removing the "extra_comments" part in the generated build spec ?

@tromai tromai self-assigned this Jul 2, 2025
@oracle-contributor-agreement oracle-contributor-agreement bot added the OCA Verified All contributors have signed the Oracle Contributor Agreement. label Jul 2, 2025
@tromai tromai added feature A new feature request and removed OCA Verified All contributors have signed the Oracle Contributor Agreement. labels Jul 2, 2025
@oracle-contributor-agreement oracle-contributor-agreement bot added the OCA Verified All contributors have signed the Oracle Contributor Agreement. label Jul 2, 2025
@tromai tromai force-pushed the tromai/add-build-spec-generation branch 2 times, most recently from 2547333 to 3014540 Compare July 9, 2025 12:25
@tromai tromai changed the title feat: add build spec generation feat: add reproducible central buildspec generation Jul 9, 2025
@tromai tromai force-pushed the tromai/add-build-spec-generation branch from 3014540 to da329fe Compare July 9, 2025 12:34
@tromai tromai marked this pull request as ready for review July 9, 2025 13:21
@tromai tromai requested a review from behnazh-w as a code owner July 9, 2025 13:21
@tromai
Copy link
Contributor Author

tromai commented Jul 15, 2025

Because we are providing the path to the database from CLI argument, we need to support mounting this database file into the container file system too.

@behnazh-w
Copy link
Member

@tromai Instead of storing the build spec in output/macaron.buildspec, what do you think about using the standard PURL-based path? That would help prevent overwrites and improve archiving outputs.

@tromai
Copy link
Contributor Author

tromai commented Jul 21, 2025

@behnazh-w I only use output/macaron.buildspec as I didn't really know a better way to implement the output path for this feature. This feature is similar to the VSA generation, in the sense that it generates one file from an input PURL.

I can definitely change it to a PURL based path format. The only downside of using a PURL based path here is that it would be a bit challenging for scripting in a CI/CD environment, if that is the targeted use case (you might need to compute the PURL based path from outside of Macaron).

@behnazh-w
Copy link
Member

This feature is similar to the VSA generation, in the sense that it generates one file from an input PURL.

I think the VSA feature is a little different because it is generated by the policy engine that enforces a given policy, which is not applied to a PURL. The subject instead is specified in the policy itself.

I can definitely change it to a PURL based path format. The only downside of using a PURL based path here is that it would be a bit challenging for scripting in a CI/CD environment, if that is the targeted use case (you might need to compute the PURL based path from outside of Macaron).

Yes that's true, but I think overall it's less error-prone and we avoid over-writing previous results.

@tromai tromai added the build_spec_generation Related to the build spec generation feature. label Jul 21, 2025
@tromai tromai force-pushed the tromai/add-build-spec-generation branch 2 times, most recently from c928fb1 to 4a4ac5d Compare August 8, 2025 06:48
@tromai tromai force-pushed the tromai/add-build-spec-generation branch 2 times, most recently from 0ff4d6c to 6a3fdb4 Compare August 18, 2025 23:53
@behnazh-w behnazh-w requested review from benmss and nicallen August 19, 2025 01:31
Trong Nhan Mai and others added 19 commits September 15, 2025 14:31
…spec script

Signed-off-by: Trong Nhan Mai <trong.nhan.mai@oracle.com>
Signed-off-by: Trong Nhan Mai <trong.nhan.mai@oracle.com>
…check result was mistakenly joined on the checkfacts.id instead og checkfact.check_result_id
… because without repository no build tool is found

This commit also add some useful debug messages for extracting values
from the database for Reproducible Central buildspec generation.
… the get latest component for purl select statement
…D and repository from the lookup Component object instead of having their own SELECT query
… the generation of reproducible central build spec
Signed-off-by: Ben Selwyn-Smith <benselwynsmith@googlemail.com>
Signed-off-by: behnazh-w <behnaz.hassanshahi@oracle.com>
Signed-off-by: behnazh-w <behnaz.hassanshahi@oracle.com>
Signed-off-by: behnazh-w <behnaz.hassanshahi@oracle.com>
@behnazh-w behnazh-w force-pushed the tromai/add-build-spec-generation branch from 489f979 to a9be2e3 Compare September 15, 2025 04:34
behnazh-w
behnazh-w previously approved these changes Sep 15, 2025
@behnazh-w behnazh-w force-pushed the tromai/add-build-spec-generation branch from b73c59c to 894461b Compare September 16, 2025 01:47
Signed-off-by: behnazh-w <behnaz.hassanshahi@oracle.com>
@behnazh-w behnazh-w force-pushed the tromai/add-build-spec-generation branch from 894461b to 211c23e Compare September 16, 2025 01:50
@behnazh-w behnazh-w added this to the Release version 0.18.0 milestone Sep 16, 2025
@behnazh-w behnazh-w merged commit 8b3c4b5 into main Sep 16, 2025
10 of 11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

build_spec_generation Related to the build spec generation feature. feature A new feature request OCA Verified All contributors have signed the Oracle Contributor Agreement.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants