Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MLIR][Python] IntegerAttribute doesn't accept unsigned values greater than 63 bits #128072

Open
teqdruid opened this issue Feb 20, 2025 · 11 comments
Labels
good first issue https://github.com/llvm/llvm-project/contribute mlir:python MLIR Python bindings

Comments

@teqdruid
Copy link
Contributor

# This works
ir.IntegerAttr.get(ir.IntegerType.get_signless(64), 0x7fffffffffffffff)

# This doesn't
ir.IntegerAttr.get(ir.IntegerType.get_signless(64), 0xffffffffffffffff)
TypeError: get(): incompatible function arguments. The following argument types are supported:
    1. get(type: pycde.circt._mlir_libs._mlir.ir.Type, value: int) -> pycde.circt._mlir_libs._mlir.ir.IntegerAttr

Invoked with types: pycde.circt._mlir_libs._mlir.ir.IntegerType, int

The problem is that the nanobind (and CAPI) calls use int64_t (signed) which 0xffffffffffffffff overflows. (This doesn't produce the nicest error message, but that's a problem with nanobind.) Generally speaking, we need a way to pass >= 64-bit values into nanobind (easy enough by using the python int type) and through the CAPI (the real issue). The 64-bit case is pretty easy to solve (using uint64_t), but we need a more generic way. In C++ world, we have APInt. Is there a C equivalent we could use? Is there some standard way we could support this?

@teqdruid teqdruid added the mlir:python MLIR Python bindings label Feb 20, 2025
@llvmbot
Copy link
Member

llvmbot commented Feb 20, 2025

@llvm/issue-subscribers-mlir-python

Author: John Demme (teqdruid)

```python # This works ir.IntegerAttr.get(ir.IntegerType.get_signless(64), 0x7fffffffffffffff)

This doesn't

ir.IntegerAttr.get(ir.IntegerType.get_signless(64), 0xffffffffffffffff)


TypeError: get(): incompatible function arguments. The following argument types are supported:
1. get(type: pycde.circt._mlir_libs._mlir.ir.Type, value: int) -> pycde.circt._mlir_libs._mlir.ir.IntegerAttr

Invoked with types: pycde.circt._mlir_libs._mlir.ir.IntegerType, int


The problem is that the nanobind (and CAPI) calls use `int64_t` (signed) which 0xffffffffffffffff overflows. (This doesn't produce the nicest error message, but that's a problem with nanobind.) Generally speaking, we need a way to pass >= 64-bit values into nanobind (easy enough by using the python int type) and through the CAPI (the real issue). The 64-bit case is pretty easy to solve (using `uint64_t`), but we need a more generic way. In C++ world, we have APInt. Is there a C equivalent we could use? Is there some standard way we could support this?
</details>

@makslevental
Copy link
Contributor

makslevental commented Feb 20, 2025

In C++ world, we have APInt. Is there a C equivalent we could use? Is there some standard way we could support this?

I have been told by @stellaraccident that llvm/Support is exempt from the decree that "no C++ shall be linked to python extensions" - see MLIRPythonExtension.Core::PRIVATE_LINK_LIBS. So possibly the solution here is simply to add APIs that pass APInt back and forth between the python extension and the code in libMLIRPython.

@stellaraccident
Copy link
Contributor

It's probably hopeless to try to standardize on the internal representation of an arbitrary precision int like what python uses. With that said, what you need is a c struct that lets you get in/out of both APInt and the procedure python has for accessing large ints. APInt itself is pretty simple in this regard: it has a numbits field and a union that either holds a single uint64 or a pointer to an array of uint64 that is heap allocated if more than one word.

This isn't a great design for dealing with a lot of large values... Will result in a lot of heap allocations.

What I'd do: define an apint_interop_t struct that has a number of bits and a pointer to uint64 words that is at least of size numbits / 64. For APIs that need to take an APInt, the caller just populates that as needed. For those that return, have the caller pass in a pointer to the struct with numbits set to the max that is allocated. Then if too small, have the read function return a failure and populate the actual numbits needed.

It's annoying, but you can build reasonably efficient interop with such a mechanism, even for the array cases, if a bit careful about it.

@teqdruid
Copy link
Contributor Author

Yeah, that sounds like the response I expected. I had hoped that someone would point me to a C APInt interop which LLVM has already using somewhere.

I'll keep this in mind for the future. I'm in sprint/hack to a demo mode right now.

Or if anyone else wants to do it. Strikes me as a pretty good first project.

@teqdruid teqdruid added the good first issue https://github.com/llvm/llvm-project/contribute label Feb 21, 2025
@llvmbot
Copy link
Member

llvmbot commented Feb 21, 2025

Hi!

This issue may be a good introductory issue for people new to working on LLVM. If you would like to work on this issue, your first steps are:

  1. Check that no other contributor has already been assigned to this issue. If you believe that no one is actually working on it despite an assignment, ping the person. After one week without a response, the assignee may be changed.
  2. In the comments of this issue, request for it to be assigned to you, or just create a pull request after following the steps below. Mention this issue in the description of the pull request.
  3. Fix the issue locally.
  4. Run the test suite locally. Remember that the subdirectories under test/ create fine-grained testing targets, so you can e.g. use make check-clang-ast to only run Clang's AST tests.
  5. Create a Git commit.
  6. Run git clang-format HEAD~1 to format your changes.
  7. Open a pull request to the upstream repository on GitHub. Detailed instructions can be found in GitHub's documentation. Mention this issue in the description of the pull request.

If you have any further questions about this issue, don't hesitate to ask via a comment in the thread below.

@llvmbot
Copy link
Member

llvmbot commented Feb 21, 2025

@llvm/issue-subscribers-good-first-issue

Author: John Demme (teqdruid)

```python # This works ir.IntegerAttr.get(ir.IntegerType.get_signless(64), 0x7fffffffffffffff)

This doesn't

ir.IntegerAttr.get(ir.IntegerType.get_signless(64), 0xffffffffffffffff)


TypeError: get(): incompatible function arguments. The following argument types are supported:
1. get(type: pycde.circt._mlir_libs._mlir.ir.Type, value: int) -> pycde.circt._mlir_libs._mlir.ir.IntegerAttr

Invoked with types: pycde.circt._mlir_libs._mlir.ir.IntegerType, int


The problem is that the nanobind (and CAPI) calls use `int64_t` (signed) which 0xffffffffffffffff overflows. (This doesn't produce the nicest error message, but that's a problem with nanobind.) Generally speaking, we need a way to pass &gt;= 64-bit values into nanobind (easy enough by using the python int type) and through the CAPI (the real issue). The 64-bit case is pretty easy to solve (using `uint64_t`), but we need a more generic way. In C++ world, we have APInt. Is there a C equivalent we could use? Is there some standard way we could support this?
</details>

@teqdruid
Copy link
Contributor Author

In C++ world, we have APInt. Is there a C equivalent we could use? Is there some standard way we could support this?

I have been told by @stellaraccident that llvm/Support is exempt from the decree that "no C++ shall be linked to python extensions" - see MLIRPythonExtension.Core::PRIVATE_LINK_LIBS. So possibly the solution here is simply to add APIs that pass APInt back and forth between the python extension and the code in libMLIRPython.

This seems pretty hacky since we'd have to add a C++ API somewhere which supports the CAPI attributes and APInt. Possible, but ugly as it doesn't solve the problem for non-C++ users.

@bababuck
Copy link

@teqdruid @stellaraccident It doesn't seem like anyone has started work on this, could I give it a go?

@stellaraccident
Copy link
Contributor

Sure - tag me explicitly if you'd like my help on reviews... I'm not otherwise watching super closely.

@teqdruid
Copy link
Contributor Author

Sure - tag me explicitly if you'd like my help on reviews... I'm not otherwise watching super closely.

Same here

teqdruid added a commit to llvm/circt that referenced this issue Feb 27, 2025
llvm/llvm-project#128072

To mitigate this, if a constant over 63-bits is specified, create it by
concatenating 32-bit constants.
@bababuck
Copy link

bababuck commented Mar 6, 2025

@stellaraccident @teqdruid

Apolgies for the slow movement on this.

My understanding is that the interfaces involved are:
Python interface:

get (PyType &type, int64_t value) {
       MlirAttribute attr = mlirIntegerAttrGet(type, value);
       return PyIntegerAttribute(type.getContext(), attr);
};

static py::int_ toPyInt(PyIntegerAttribute &self) {
    MlirType type = mlirAttributeGetType(self);
    if (mlirTypeIsAIndex(type) || mlirIntegerTypeIsSignless(type))
          return mlirIntegerAttrGetValueInt(self);
    if (mlirIntegerTypeIsSigned(type))
          return mlirIntegerAttrGetValueSInt(self);
    return mlirIntegerAttrGetValueUInt(self);

CAPI:

/// Returns the value stored in the given integer attribute, assuming the value
/// is of signless type and fits into a signed 64-bit integer.
MLIR_CAPI_EXPORTED int64_t mlirIntegerAttrGetValueInt(MlirAttribute attr);

/// Returns the value stored in the given integer attribute, assuming the value
/// is of signed type and fits into a signed 64-bit integer.
MLIR_CAPI_EXPORTED int64_t mlirIntegerAttrGetValueSInt(MlirAttribute attr);

/// Returns the value stored in the given integer attribute, assuming the value
/// is of unsigned type and fits into an unsigned 64-bit integer.
MLIR_CAPI_EXPORTED uint64_t mlirIntegerAttrGetValueUInt(MlirAttribute attr);

For Python, my understanding is that way in and out for large integers in python is via use of py::byte.

I just want to make sure I'm understanding correctly what was meant by the following:

For APIs that need to take an APInt, the caller just populates that as needed. For those that return, have the caller pass in a pointer to the struct with numbits set to the max that is allocated.

From this, my understanding is that the CAPI should be updated with functions like the following:

MLIR_CAPI_EXPORTED int mlirIntegerAttrGetValueUInterop(MlirAttribute attr, apint_interop_t *interop);

The Python interface would call these new functions to receive an interop and the interface would convert the uint64_t array to interface with the py::int_ class via py::byte. Does this sound correct?

I also wanted to understand better how your proposed design helps with heap accesses, it's unclear to me where we could save on allocations. Thanks!

bababuck added a commit to bababuck/llvm-project that referenced this issue Mar 10, 2025
Fixes issue llvm#128072.

Allows for arbitrarily sized integers to be requested via Python.
bababuck added a commit to bababuck/llvm-project that referenced this issue Mar 20, 2025
Fixes issue llvm#128072.

Allows for arbitrarily sized integers to be requested via Python.
bababuck added a commit to bababuck/llvm-project that referenced this issue Mar 20, 2025
Fixes issue llvm#128072.

Allows for arbitrarily sized integers to be requested via Python.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue https://github.com/llvm/llvm-project/contribute mlir:python MLIR Python bindings
Projects
None yet
Development

No branches or pull requests

5 participants