Skip to content

Transliteration table has several mistakes and more gaps, should use standard library #37802

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
2 of 5 tasks
Crissov opened this issue Jul 26, 2023 · 12 comments
Open
2 of 5 tasks
Labels
Area: SEO Component: Url Issue: Confirmed Gate 3 Passed. Manual verification of the issue completed. Issue is confirmed Priority: P3 May be fixed according to the position in the backlog. Reported on 2.4.x Indicates original Magento version for the Issue report. Reproduced on 2.4.x The issue has been reproduced on latest 2.4-develop branch

Comments

@Crissov
Copy link

Crissov commented Jul 26, 2023

Preconditions and environment

Steps to reproduce

  1. set up a new product whose name includes characters ſ (long s), þ (thorn) and ð (eth)
  2. save to get a slugified URL key generated for SEO

Expected result

In the generated URL key,

  • ſ becomes s
  • þ becomes th
  • ð becomes d, although dh and even th would also be acceptable

Actual result

  • ſ becomes z
  • þ becomes p
  • ð is removed

Additional information

These are just some mistakes I easily spotted by looking at the file. I’m pretty sure there are also errors (or questionable choices) in the romanisation of Cyrillic, Greek, Hebrew and Devanagari. The selection of less than 500 characters to be transliterated seems very random, so people created modules to properly support languages like Romanian and Vietnamese.

Just for Japanese, magento2-jp already introduces the use of PHP’s Transliterator which is the right tool for the job. Its data comes from ICU which in turn uses CLDR data, both maintained by Unicode, i.e. it is as reliable as it gets (and will still be improved in the future).

If Transliterator is not to be used for some reason, Magento should at least use the Unicode data for Latin-ASCII and …-Latn.

PS: Ideally, Magento would support setting a language for a store view which would then be respected for stuff like German umlauts (äae) that deviates from the script default (a) – CLDR offers de-ASCII for that, also see #23292. Administrators should also be able to opt into UTF-8 percent encoding in all cases, but let’s keep this a bug report and not a feature request.
PPS: This won’t cover stuff like ½″ which would ideally become half-inch but at best will be 1-2, or 0.5 cm which would better become 5mm than 0-5-cm.

Release note

No response

Triage and priority

  • Severity: S0 - Affects critical data or functionality and leaves users without workaround.
  • Severity: S1 - Affects critical data or functionality and forces users to employ a workaround.
  • Severity: S2 - Affects non-critical data or functionality and forces users to employ a workaround.
  • Severity: S3 - Affects non-critical data or functionality and does not force users to employ a workaround.
  • Severity: S4 - Affects aesthetics, professional look and feel, “quality” or “usability”.
@m2-assistant
Copy link

m2-assistant bot commented Jul 26, 2023

Hi @Crissov. Thank you for your report.
To speed up processing of this issue, make sure that the issue is reproducible on the vanilla Magento instance following Steps to reproduce. To deploy vanilla Magento instance on our environment, Add a comment to the issue:


Join Magento Community Engineering Slack and ask your questions in #github channel.
⚠️ According to the Magento Contribution requirements, all issues must go through the Community Contributions Triage process. Community Contributions Triage is a public meeting.
🕙 You can find the schedule on the Magento Community Calendar page.
📞 The triage of issues happens in the queue order. If you want to speed up the delivery of your contribution, join the Community Contributions Triage session to discuss the appropriate ticket.

@m2-assistant
Copy link

m2-assistant bot commented Jul 27, 2023

Hi @engcom-Bravo. Thank you for working on this issue.
In order to make sure that issue has enough information and ready for development, please read and check the following instruction: 👇

  • 1. Verify that issue has all the required information. (Preconditions, Steps to reproduce, Expected result, Actual result).
  • 2. Verify that issue has a meaningful description and provides enough information to reproduce the issue.
  • 3. Add Area: XXXXX label to the ticket, indicating the functional areas it may be related to.
  • 4. Verify that the issue is reproducible on 2.4-develop branch
    Details- Add the comment @magento give me 2.4-develop instance to deploy test instance on Magento infrastructure.
    - If the issue is reproducible on 2.4-develop branch, please, add the label Reproduced on 2.4.x.
    - If the issue is not reproducible, add your comment that issue is not reproducible and close the issue and stop verification process here!
  • 5. Add label Issue: Confirmed once verification is complete.
  • 6. Make sure that automatic system confirms that report has been added to the backlog.

@engcom-Bravo engcom-Bravo added the Reported on 2.4.x Indicates original Magento version for the Issue report. label Jul 27, 2023
@engcom-Bravo
Copy link
Contributor

@magento give me 2.4-develop instance

@magento-deployment-service
Copy link

Hi @engcom-Bravo. Thank you for your request. I'm working on Magento instance for you.

@magento-deployment-service
Copy link

@engcom-Bravo
Copy link
Contributor

Hi @Crissov,

Thank you for reporting and collaboration.

Verified the issue on Magento 2.4-develop instance and the issue is reproducible.Kindly refer the screenshots.

Steps to reproduce

  • set up a new product whose name includes characters ſ (long s), þ (thorn) and ð (eth)
  • save to get a slugified URL key generated for SEO
Screenshot 2023-07-28 at 2 51 46 PM
  • ſ becomes z
  • þ becomes p
  • ð is removed and we are getting as -

We have referred this document https://en.wikipedia.org/wiki/S as per this document ſ it should be s.

Hence Confirming the issue.

Thanks.

@engcom-Bravo engcom-Bravo added Area: SEO Component: Url Reproduced on 2.4.x The issue has been reproduced on latest 2.4-develop branch Issue: Confirmed Gate 3 Passed. Manual verification of the issue completed. Issue is confirmed Priority: P3 May be fixed according to the position in the backlog. labels Jul 31, 2023
@github-jira-sync-bot
Copy link

✅ Jira issue https://jira.corp.adobe.com/browse/AC-9226 is successfully created for this GitHub issue.

@m2-assistant
Copy link

m2-assistant bot commented Jul 31, 2023

✅ Confirmed by @engcom-Bravo. Thank you for verifying the issue.
Issue Available: @engcom-Bravo, You will be automatically unassigned. Contributors/Maintainers can claim this issue to continue. To reclaim and continue work, reassign the ticket to yourself.

@engcom-Bravo
Copy link
Contributor

Hi @Crissov,

Thanks for your reporting and collaboration.

We have tried to reproduce the issue in Latest 2.4-develop instance and the issue is still reproducible.Kindly refer the screenshots.

Image
  • ſ becomes z
  • þ becomes p
  • ð is removed

Hence Confirming the issue.

Thanks.

@engcom-Bravo engcom-Bravo removed Issue: Confirmed Gate 3 Passed. Manual verification of the issue completed. Issue is confirmed Reproduced on 2.4.x The issue has been reproduced on latest 2.4-develop branch Progress: ready for dev Priority: P3 May be fixed according to the position in the backlog. labels Apr 15, 2025
@engcom-Bravo engcom-Bravo added Issue: Confirmed Gate 3 Passed. Manual verification of the issue completed. Issue is confirmed Reproduced on 2.4.x The issue has been reproduced on latest 2.4-develop branch Priority: P3 May be fixed according to the position in the backlog. Area: SEO and removed Area: SEO Issue: Confirmed Gate 3 Passed. Manual verification of the issue completed. Issue is confirmed labels Apr 15, 2025
@ct-prd-projects-boards-automation ct-prd-projects-boards-automation bot moved this to Ready for Development in Low Priority Backlog Apr 15, 2025
@github-jira-sync-bot github-jira-sync-bot removed the Issue: Confirmed Gate 3 Passed. Manual verification of the issue completed. Issue is confirmed label Apr 15, 2025
@github-jira-sync-bot
Copy link

Unfortunately, not enough information was provided to create a Jira ticket. Please make sure you added the following label(s): Reproduced on 2.4.x, ^Area:.*

Once all required labels are present, please add Issue: Confirmed label again.

@engcom-Bravo engcom-Bravo added the Issue: Confirmed Gate 3 Passed. Manual verification of the issue completed. Issue is confirmed label Apr 15, 2025
@github-jira-sync-bot
Copy link

❌ Cannot export the issue. This GitHub issue is already linked to Jira issue(s): https://jira.corp.adobe.com/browse/AC-9226

1 similar comment
@github-jira-sync-bot
Copy link

❌ Cannot export the issue. This GitHub issue is already linked to Jira issue(s): https://jira.corp.adobe.com/browse/AC-9226

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: SEO Component: Url Issue: Confirmed Gate 3 Passed. Manual verification of the issue completed. Issue is confirmed Priority: P3 May be fixed according to the position in the backlog. Reported on 2.4.x Indicates original Magento version for the Issue report. Reproduced on 2.4.x The issue has been reproduced on latest 2.4-develop branch
Projects
Status: Ready for Development
Development

No branches or pull requests

3 participants