-
Notifications
You must be signed in to change notification settings - Fork 14
commit.go: support multi-line header continuations #10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| for _, hdr := range c.ExtraHeaders { | ||
| n3, err := fmt.Fprintf(to, "%s %s\n", hdr.K, hdr.V) | ||
| n3, err := fmt.Fprintf(to, "%s %s\n", | ||
| hdr.K, strings.Replace(hdr.V, "\n", "\n ", -1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In future versions of Go, this line can be replaced with strings.ReplaceAll(hdr.V, "\n", "\n "), but this was introduced in Go 1.12, which we don't build against yet.
bk2204
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, I think this is a great improvement and I'm excited to see it. I'd like to see an additional test, though, to make sure we round-trip things properly and to help us avoid breaking things in the future.
When Git wishes to continue one or more of a commit's extra headers on
more than a single line, it writes out the following:
parent: <SHA-1>
tree: <SHA-1>
gpgsig: -----BEGIN PGP SIGNATURE-----
<signature>
-----END PGP SIGNATURE-----
Our current parsing implementation does not handle this correctly, based
on a misunderstanding that one line is equivalent to one extra header,
and vice versa.
In fact, the situation presently is even more dire than not parsing the
'gpgsig' header incorrectly: we'll split the signature end ending line
into their own "headers" and in doing so trim off the leading
whitespace. In practice, this means that we can corrupt commits when
round-tripping them in many interesting ways [1].
To address the situation, we do two things:
1. Teach gitobj that when we are parsing extra headers for a commit,
_and_ a header line begins with a single whitespace character, we
are in fact continuing the last known header.
2. Likewise, teach gitobj that when encoding a commit which has an
extra header whose value contains a LF character, replace each LF
with a leading space, to round trip commits of this form
successfully.
Together, (1) and (2) means that we parse the 'gpgsig' header in the
above example as a _single_ entry in the commit's 'ExtraHeaders' field,
as expected.
[1]: git-lfs/git-lfs#3530
ab42a1c to
930b3ff
Compare
When parsing an extra header that is continued over multiple lines, an earlier check on the length of whitespace-separated fields caused the loop to terminate early, dropping continuation lines that consist only of whitespace. Tweak the logic slightly in order to capture these, and allow us to successfully round-trip commit parsing.
bk2204
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great! Thanks for adding the new test.
When Git wishes to continue one or more of a commit's extra headers on
more than a single line, it writes out the following:
Our current parsing implementation does not handle this correctly, based
on a misunderstanding that one line is equivalent to one extra header,
and vice versa.
In fact, the situation presently is even more dire than not parsing the
'gpgsig' header incorrectly: we'll split the signature end ending line
into their own "headers" and in doing so trim off the leading
whitespace. In practice, this means that we can corrupt commits when
round-tripping them in many interesting ways [1].
To address the situation, we do two things:
Teach gitobj that when we are parsing extra headers for a commit,
and a header line begins with a single whitespace character, we
are in fact continuing the last known header.
Likewise, teach gitobj that when encoding a commit which has an
extra header whose value contains a LF character, replace each LF
with a leading space, to round trip commits of this form
successfully.
Together, (1) and (2) means that we parse the 'gpgsig' header in the
above example as a single entry in the commit's 'ExtraHeaders' field,
as expected.
[1]: git-lfs/git-lfs#3530
/cc @git-lfs/core, especially @bk2204
/cc git-lfs/git-lfs#3530