Odd table shapes issue #1405

aanastasiou · 2024-06-11T14:34:02Z

I am pre-processing a large number of .docx documents with really oddly shaped tables containing text that has to be extracted verbatim.

As useful python-docx has been in this task, a subset of those documents revealed a tiny little bug in this line.

This PR fixes cases of odd table shapes were the strategy of populating a cell with the value of the previous cell (e.g. in the case of row/cell merges) fails, because there simply has not been a 'previous cell' yet.

Please note, I would be glad to contribute a test case as well but this might take a bit more time, tracking down the exact table (within the XML) that causes the bug and creating an "equivalent" test case.

Hope this helps.

…l with the value of the previous cell (e.g. in the case of row/cell merges) fails, because there simply has not been a 'previous cell' yet

scanny · 2024-06-11T16:07:42Z

@aanastasiou there was a recent update that addressed the "skipped-cells" condition that is actually a legitimate (although relatively unusual) table state.

If you use Table.rows to get rows and then iterate _Row.cells to get each cell you shouldn't have a problem there.

Depending on your needs for column alignment you may want to use _Row.grid_cols_before and .grid_cols_after to discover the empty leading and trailing cells.

There is also a new _Cell.grid_span property so you can tell how many grid-cells a horizontally-merged cell occupies.

I'm not sure what we'll do with Table._cells. It's possible that collection will be deprecated or perhaps we'll reimplement it based on the new "skipped-cell-aware" code, but for now it is probably better to avoid it in favor of the new methods.

aanastasiou · 2024-06-25T14:17:51Z

@scanny thank you very much for the prompt response. This was using the latest python-docx from pypi, would this recent update be applied to the version on github rather than pypi? Thanks for the rest of the information, it's good to know for our next code revision.

scanny · 2024-06-25T16:26:48Z

This change appears in v1.1.2, which is the current PyPI version, released on May 1, 2024:
https://pypi.org/project/python-docx/
f4a48b5

aanastasiou · 2024-06-28T12:36:12Z

@scanny This is the version that I used (and eventually led me to file this PR)

scanny · 2024-06-28T16:59:26Z

Show me the client code that isn't working the way you want.

aanastasiou · 2024-07-12T10:20:00Z

@scanny The PR contains the exact problem that I dealt with (and how), what might take longer is me locating the exact document that causes this behaviour.

scanny · 2024-07-12T18:35:15Z

@aanastasiou the idea there is not that this problem with table._cells is fixed for your case, but rather that you should no longer need to use table._cells and can use something like (c for row in table.rows for c in row.cells).

If you can post the code you're using to traverse cells and which gives rise to the error you mention I expect I'll be able to describe how to modify it to avoid any exceptions for uneven row lengths.

Fixes cases of odd table shapes were the strategy of populating a cel…

972e71a

…l with the value of the previous cell (e.g. in the case of row/cell merges) fails, because there simply has not been a 'previous cell' yet

aanastasiou closed this Jun 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Odd table shapes issue #1405

Odd table shapes issue #1405

Uh oh!

aanastasiou commented Jun 11, 2024

Uh oh!

scanny commented Jun 11, 2024

Uh oh!

aanastasiou commented Jun 25, 2024

Uh oh!

scanny commented Jun 25, 2024

Uh oh!

aanastasiou commented Jun 28, 2024

Uh oh!

scanny commented Jun 28, 2024

Uh oh!

aanastasiou commented Jul 12, 2024

Uh oh!

scanny commented Jul 12, 2024 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Odd table shapes issue #1405

Odd table shapes issue #1405

Uh oh!

Conversation

aanastasiou commented Jun 11, 2024

Uh oh!

scanny commented Jun 11, 2024

Uh oh!

aanastasiou commented Jun 25, 2024

Uh oh!

scanny commented Jun 25, 2024

Uh oh!

aanastasiou commented Jun 28, 2024

Uh oh!

scanny commented Jun 28, 2024

Uh oh!

aanastasiou commented Jul 12, 2024

Uh oh!

scanny commented Jul 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

scanny commented Jul 12, 2024 •

edited

Loading