Skip to content

Conversation

@ttaylorr
Copy link
Contributor

@ttaylorr ttaylorr commented Mar 8, 2019

When Git wishes to continue one or more of a commit's extra headers on
more than a single line, it writes out the following:

parent: <SHA-1>
tree: <SHA-1>
...
gpgsig: -----BEGIN PGP SIGNATURE-----
 <signature>
 -----END PGP SIGNATURE-----

Our current parsing implementation does not handle this correctly, based
on a misunderstanding that one line is equivalent to one extra header,
and vice versa.

In fact, the situation presently is even more dire than not parsing the
'gpgsig' header incorrectly: we'll split the signature end ending line
into their own "headers" and in doing so trim off the leading
whitespace. In practice, this means that we can corrupt commits when
round-tripping them in many interesting ways [1].

To address the situation, we do two things:

  1. Teach gitobj that when we are parsing extra headers for a commit,
    and a header line begins with a single whitespace character, we
    are in fact continuing the last known header.

  2. Likewise, teach gitobj that when encoding a commit which has an
    extra header whose value contains a LF character, replace each LF
    with a leading space, to round trip commits of this form
    successfully.

Together, (1) and (2) means that we parse the 'gpgsig' header in the
above example as a single entry in the commit's 'ExtraHeaders' field,
as expected.

[1]: git-lfs/git-lfs#3530

/cc @git-lfs/core, especially @bk2204
/cc git-lfs/git-lfs#3530

for _, hdr := range c.ExtraHeaders {
n3, err := fmt.Fprintf(to, "%s %s\n", hdr.K, hdr.V)
n3, err := fmt.Fprintf(to, "%s %s\n",
hdr.K, strings.Replace(hdr.V, "\n", "\n ", -1))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In future versions of Go, this line can be replaced with strings.ReplaceAll(hdr.V, "\n", "\n "), but this was introduced in Go 1.12, which we don't build against yet.

Copy link
Member

@bk2204 bk2204 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, I think this is a great improvement and I'm excited to see it. I'd like to see an additional test, though, to make sure we round-trip things properly and to help us avoid breaking things in the future.

When Git wishes to continue one or more of a commit's extra headers on
more than a single line, it writes out the following:

  parent: <SHA-1>
  tree: <SHA-1>
  gpgsig: -----BEGIN PGP SIGNATURE-----
   <signature>
   -----END PGP SIGNATURE-----

Our current parsing implementation does not handle this correctly, based
on a misunderstanding that one line is equivalent to one extra header,
and vice versa.

In fact, the situation presently is even more dire than not parsing the
'gpgsig' header incorrectly: we'll split the signature end ending line
into their own "headers" and in doing so trim off the leading
whitespace. In practice, this means that we can corrupt commits when
round-tripping them in many interesting ways [1].

To address the situation, we do two things:

  1. Teach gitobj that when we are parsing extra headers for a commit,
     _and_ a header line begins with a single whitespace character, we
     are in fact continuing the last known header.

  2. Likewise, teach gitobj that when encoding a commit which has an
     extra header whose value contains a LF character, replace each LF
     with a leading space, to round trip commits of this form
     successfully.

Together, (1) and (2) means that we parse the 'gpgsig' header in the
above example as a _single_ entry in the commit's 'ExtraHeaders' field,
as expected.

[1]: git-lfs/git-lfs#3530
@ttaylorr ttaylorr force-pushed the ttaylorr/multi-line-headers branch from ab42a1c to 930b3ff Compare March 9, 2019 04:47
ttaylorr added 2 commits March 8, 2019 21:09
When parsing an extra header that is continued over multiple lines, an
earlier check on the length of whitespace-separated fields caused the
loop to terminate early, dropping continuation lines that consist only
of whitespace.

Tweak the logic slightly in order to capture these, and allow us to
successfully round-trip commit parsing.
Copy link
Member

@bk2204 bk2204 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! Thanks for adding the new test.

@ttaylorr ttaylorr merged commit b35104c into master Mar 11, 2019
@ttaylorr ttaylorr deleted the ttaylorr/multi-line-headers branch March 11, 2019 17:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants