Duplicated characters near comma

### Bug
We are extract text normally with docling with this pdf. We are getting duplicate characters.

#### Text in PDF
<img width="1621" height="180" alt="Image" src="https://github.com/user-attachments/assets/465cb2f9-3927-4cb0-a39b-6bfa78deb9b1" />


#### Text Extracted
> - (4) Given the sensitivity of personal electronic health data, this Regulation seeks to provide sufficient safeguards at both Union and national level to ensure a high degree of data protection, security, y, confidentiality and ethical use. Such safeguards are necessary to promote trust in safe handling of electronic health data of natural persons for primary use and secondary use as defined in this Regulation.

here we have `security, y, ` instead of `security,`



### Steps to reproduce

```python
ocr_options = TesseractOcrOptions()
device = AcceleratorDevice.AUTO

accelerator_options = AcceleratorOptions(
    num_threads=2,
    device=device,
)

pipeline_options = PdfPipelineOptions(
    do_ocr=False,
    ocr_options=ocr_options,
    table_structure_options=TableStructureOptions(
        mode=TableFormerMode.FAST, do_cell_matching=True
    ),
    do_table_structure=True,
    accelerator_options=accelerator_options,
    artifacts_path=self.docling_artifacts_path or None,
)

converter = CustomDocumentConverter(
    allowed_formats=DOCLING_ALLOWED_FORMATS,
    format_options={
        InputFormat.PDF: PdfFormatOption(
            backend=PyPdfiumDocumentBackend,
            pipeline_options=pipeline_options,
        ),
    },
)

# Then invoked like this:
md_text = converter.convert()
```

### Docling version

```
docling                                  2.54.0
docling-core                             2.48.4
docling-ibm-models                       3.9.1
docling-parse                            4.5.0
```

### Python version
`Python 3.12.7`

---

### PDF File needed to reproduce this bug

[PDF.pdf](https://github.com/user-attachments/files/22682018/PDF.pdf)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Duplicated characters near comma #2377

Bug

Text in PDF

Text Extracted

Steps to reproduce

Docling version

Python version

PDF File needed to reproduce this bug

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Duplicated characters near comma #2377

Description

Bug

Text in PDF

Text Extracted

Steps to reproduce

Docling version

Python version

PDF File needed to reproduce this bug

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions