Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

protoc-gen-go: generate MarshalBinary/UnmarshalBinary methods for messages #210

Open
phifty opened this issue Aug 2, 2016 · 19 comments
Open
Labels
generator-proto-option involves generators respecting a proto option to control generated source output proposal
Milestone

Comments

@phifty
Copy link

phifty commented Aug 2, 2016

If the generated code would include the methods MarshalBinary and UnmarshalBinary from Go's stdlib, the corresponding interfaces from the encoding package would be satisfied. Storage/Transmitting-Code could be unified and a dependency to the protobuf package could be removed. The methods would look like:

func (m *Model) MarshalBinary() ([]byte, error) {
    return proto.Marshal(m)
}

func (m *Model) UnmarshalBinary(data []byte) error {
    return proto.Unmarshal(data, m)
}
@dsnet
Copy link
Member

dsnet commented Aug 2, 2016

Hi, may we get more information about the use case for this? The encoding.BinaryMarshaler and encoding.BinaryUnmarshaler were intended to be used by data serialization packages. At least in the standard library, only gob uses that interface. Even without Model implementing those interfaces, gob should be able serialize them just fine. Is there some other serialization package that we're not aware of?

@phifty
Copy link
Author

phifty commented Aug 3, 2016

Wow, thanks for the prompt reply.
My use case is pretty simple. Let's assume, there is function that stores a given model somewhere. Since we want to separate concerns, it should only store the model and not serialize it. The signature would look like:

func store(key string, model encoding.BinaryMarshaler) error { ... }

This way it takes a model that somehow serialize to a binary format and the function does not have to worry about how this is done. Further the storage-package doesn't have any dependency on the proto, gob or json package, so if the serialization format changes, the storage code stay the same.

Currently, I'm using Protocol Buffers for the models and manually implement the MarshalBinary and UnmarshalBinary methods on them. It would easier to have an option in the .proto file to make the generator create them.

@dsnet
Copy link
Member

dsnet commented Aug 4, 2016

Just wondering, wouldn't store be more versatile with the following?

func store(key string, model interface{}) error { ... }

And under the hood, store uses gob? It does add a dependency on gob, but gob is fairly versatile in being able to serialize to binary format nearly any Go data structure, regardless of whether is a BinaryMarshaler or not.

I'm not opposed to adding MarshalBinary/UnmarshalBinary, but I would like to explore all options first.

@phifty
Copy link
Author

phifty commented Aug 5, 2016

I agree that this would work on 90% of the use cases. But what if I have a model that should serialize all of it's fields, except it's Password field? A per-model decision how the serialization is done seems more convenient here. Or if all the existing models has been serialized with gob and the new ones should use the proto package.

To me it's seem like a better design if a store function just stores a thing that can be serialized and don't have an opinion about how this is done.

My other argument is, that I find

func store(key string, model encoding.BinaryMarshaler) error { ... }

a bit more expressive. I can tell what's going to happend just by looking at the function's signature. If interface{} would be allowed, I would not know what happens if a chan or func is passed. I would have to read the function's statements and the gob documentation.

I think implicit implemented interfaces is the most powerful feature of Go and as a consequence the stdlib-interfaces became kind of a common language for libraries. io.Reader had it's success-story and encoding.BinaryMarshaler could have one too ;-)

@dsnet
Copy link
Member

dsnet commented Aug 5, 2016

I think there's a valid use case for add encoding.BinaryMarshaler and encoding.BinaryUnmarshaler.

@phifty
Copy link
Author

phifty commented Aug 6, 2016

Great! Thanks for considering the proposal. This would help me a lot.

I've started to implement some helpers for BoltDB that implement such interfaces: https://github.com/simia-tech/boltx

@dsnet dsnet added the proposal label Feb 14, 2018
@dsnet dsnet changed the title proposal: generate MarshalBinary/UnmarshalBinary methods for messages proposal: protoc-gen-go: generate MarshalBinary/UnmarshalBinary methods for messages Feb 14, 2018
@dsnet dsnet added the generator-proto-option involves generators respecting a proto option to control generated source output label Mar 10, 2018
@dsnet dsnet removed the enhancement label Jul 10, 2019
@dsnet dsnet changed the title proposal: protoc-gen-go: generate MarshalBinary/UnmarshalBinary methods for messages protoc-gen-go: generate MarshalBinary/UnmarshalBinary methods for messages Mar 4, 2020
@adam-azarchs
Copy link

To provide further motivation for this 4.5-year-old feature request, gocloud.dev/docstore is another example of a package which is aware of MarshalBinary and UnmarshalBinary methods. Fields on types passed to that API must either be of a primitive type or must implement encoding.BinaryMarshaler or encoding.TextMarshaler.

Note that I am not suggesting that messages should also implement encoding.TextMarshaler (with e.g. textproto), as there's probably a bunch of cases where that would or might result in undesirable behavior, the most obvious being json (unless it also implemented json.Marshaler). I don't believe that encoding.BinaryMarshaler poses similar risks, however.

@dsnet dsnet added this to the unplanned milestone Mar 29, 2021
@atombender
Copy link

This would also benefit Go-Redis, which relies on BinaryMarshaler to marshal values.

@sgtsquiggs
Copy link

Came here for the go-redis usecase. We're caching grpc responses in the raw and this would save us some boilerplate code.

Alternatively this could be done trivially w/ a protoc plugin, like this one that implement the encoding/json iface: https://github.com/mitchellh/protoc-gen-go-json

@derekperkins
Copy link

Is this just waiting for a PR, or does an official decision need to be reached?

@derekperkins
Copy link

I went ahead and mailed a commit to add this, open to feedback, but it's pretty simple
https://go-review.googlesource.com/c/protobuf/+/517315

@nightlyone
Copy link

Please keep in mind that those methods can easily be provided by an extra protoc plugin and may also provide a different a binary serialisation that isn't compatible with proto3 wire format at all. That would also allow steering via extra options designed for that usecase (e.g. declaring password immission as mentioned above).

Thinking further by relying on encoding.MarshalBinary for protobuf, encoding may accidentally generate a binary serialisation that isn't compatible with the wire format or accidentally omit valuable data like the password in aboves example.

So I am not sure that is needed for the task of generating models usable for protobuf serialisation/deserialisation.

So I have a hard time understanding why that change is in the scope of this tool.

@puellanivis
Copy link
Collaborator

I would strongly suggest that this be performed by an additional protoc plugin, rather than working it into the official library. As mentioned in the review of the change request, “The problem is that this leads to a non-trivial binary size increase.”

@neild
Copy link
Contributor

neild commented Aug 9, 2023

This adds a dependency on the proto package from the generated packages. I don't think we currently have that dependency link. I'm not sure what the implications of adding it are. Might be fine, anything that imports generated proto packages probably imports proto somewhere.

This could also break existing code that depends on these methods not existing. I'm not sure how plausible that is.

Neither point is necessarily an argument against the change, but something to consider.

You could probably use m.ProtoReflect().ProtoMethods().Marshal(...) instead of proto.Marshal (since the generated message knows that it supports the fast path), which would avoid the proto dependency and might be a bit faster. Not sure if that's worth it.

@halturin
Copy link

the main problem of encoding.MarshalBinary - it makes a huge pressure on GC since there is no way to reuse byte buffers.

I wish I had like

type BinaryMarshalerIO interface {
	MarshalBinary(w io.Writer) error
}

@meling
Copy link

meling commented Mar 15, 2025

Would the new BinaryAppender interface from Go 1.24 help here?

@halturin
Copy link

halturin commented Mar 16, 2025

Would the new BinaryAppender interface from Go 1.24 help here?

It doesn't cover the cases with reallocation. Having io.Writer allows you to keep control over the underlying buffer

PS probably better to name it BinaryMarshalerWriter

@adam-azarchs
Copy link

BinaryAppender helps with the GC traffic, as it allows you to pre-allocate and reuse a byte slice as the destination. It doesn't solve the problem of memory usage, however, if for example you wanted to stream the serialization directly into a file or network response, rather than buffering the entire serialized message in memory before sending it on1.

Note by the way that there is a cost to generating these methods, in that it messes with the compiler's ability to do dead-code elimination, as it is very difficult for the compiler to prove that a given type that could be used as that interface isn't. In this case, the functions in question are tiny so it I wouldn't think it would be a real problem.

There is a workaround here for those that need this functionality now. One can always write e.g.

func (m *Model) MarshalBinary() ([]byte, error) {
    return proto.Marshal(m)
}

func (m *Model) UnmarshalBinary(data []byte) error {
    return proto.Unmarshal(data, m)
}

into a sibling file in the same package as the generated code, since go doesn't require every method on a type to be defined in the same file. It's a bit of a PITA if you require this of all of your messages, however.

Footnotes

  1. in both those cases you'd probably want to wrap with a bufio.Writer to avoid an excessive number of syscalls, but the size of that buffer stays fixed regardless of the size of the message being serialized

@halturin
Copy link

BinaryAppender helps with the GC traffic, as it allows you to pre-allocate and reuse a byte slice as the destination. It doesn't solve the problem of memory usage, however, if for example you wanted to stream the serialization directly into a file or network response, rather than buffering the entire serialized message in memory before sending it on1.

seems you didnt get the idea. it is not about the writing to file/socket

I've created proposal for the encoding package golang/go#72915.

will see...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
generator-proto-option involves generators respecting a proto option to control generated source output proposal
Projects
None yet
Development

No branches or pull requests