Test Unicode string. #1511

fredericDelaporte · 2018-01-02T19:11:29Z

The tests do not check Unicode string, here is a basic test for this.
I expect it to fail for Firebird and Oracle. I will add test fix commits (and async regen) afterward.

fredericDelaporte · 2018-01-02T22:54:27Z

src/NHibernate.Test/TypesTest/StringTypeFixture.cs

+		[Test]
+		public void InsertUnicodeValue()
+		{
+			const string unicode = "길동 최고 新闻 地图 ます プル éèêëîïôöõàâäåãçùûü бджзй αβ ខគឃ ضذخ";


That is some characters taken from NHCH-43, then some European accentuated letters, some Cyrillic letters, two Greek ones, a bunch of Khmer ones (likely the script supported the latest on windows) and some Arabic ones (which is in my experience one of the worst to support due to right-to-left logic, but that is more a trouble for display than for storage; Khmer was easier to deal with even when not yet officially supported on Windows).

fredericDelaporte · 2018-01-02T22:57:24Z

build-common/teamcity-hibernate.cfg.xml

@@ -24,5 +24,6 @@
 			<property name="adonet.wrap_result_sets">false</property>

 			<property name="odbc.explicit_datetime_scale"></property>
+			<property name="oracle.use_n_prefixed_types_for_unicode"></property>


The TeamCity Nant build logic just does nothing of properties to adjust when they are lacking in this file, thus causing Oracle to not be properly set up for actually using Unicode with DbType.String type. On our TeamCity agent, its default charset is an ascii one, and so we must use the "N" prefixed char types for storing Unicode strings.

fredericDelaporte · 2018-01-02T23:01:37Z

src/NHibernate.Config.Templates/FireBird.cfg.xml

@@ -22,6 +22,7 @@ for your own use before compile tests in VisualStudio.
 			Database=nhibernate;
 			User ID=SYSDBA;Password=masterkey;
 			MaxPoolSize=200;
+			charset=utf8;


Default charset for FireBird is "None", theoretically meaning that it is the application business to encode/decode it, Firebird being supposed to stores and gives data back "as is", maybe in some binary sens. In our case, that goes wrong. I have not investigated why, I have just switched that to utf8 instead.

This requires to recreate the database, which the TeamCity build does for Firebird. For local tests, adjust your configuration files accordingly and run the TestDatabaseSetup test for fixing this.

(Of course this file here is just the default template, the actual fix for TeamCity is the similar change done in teamcity.build file.)

fredericDelaporte · 2018-01-02T23:04:40Z

src/NHibernate.Config.Templates/Oracle-Managed.cfg.xml

@@ -14,5 +14,6 @@ for your own use before compile tests in VisualStudio.
 		<property name="show_sql">false</property>
 		<property name="dialect">NHibernate.Dialect.Oracle10gDialect</property>
 		<property name="query.substitutions">true 1, false 0, yes 'Y', no 'N'</property>
+		<property name="oracle.use_n_prefixed_types_for_unicode">false</property>


For helping those installing Oracle with an non-Unicode default charset, I have added this parameter in the sample file. So they will just need to switch it to true, as explained in the instruction file in the same folder.

fredericDelaporte · 2018-01-04T15:22:24Z

@hazzik, the remaining Oracle failure is a new issue. A cast to nvarchar2(4000) fails with ORA-00910, meaning the test has tried to oversize the string type. I think that when Oracle was recently re-installed on TeamCity, it has lost a setting: extended string support.

I am going to enable it.

A bit of background for explaining why I think this:

When NH-4062 (Properly handle Oracle Unicode support dual model) was done, the TeamCity build was not properly changed for correctly setting oracle.use_n_prefixed_types_for_unicode. The TeamCity NHibernate agent needs it since its Oracle is configured "the old default way", with an ASCII charset. The lack of this parameter results in all string types using varchar2 instead of nvarchar2, which on TeamCity's Oracle current setup, means using an non Unicode string type. Thus the failure of the new test I have added here. But since no previous tests were actually requiring Unicode, this has gone unnoticed, causing the remaining failing test here to use a varchar2(4000).

Before NH-4062, NHibernate Oracle dialect was assuming nvarchar2 has to be always used for Unicode string, which is wrong with modern setup and not recommended. (See by example here.) So before NH-4062, the failing test here was already using a nvarchar2(4000), but without triggering the ORA-00910. This is only possible under two cases: either the NLS_NCHAR_CHARACTERSET was not the default UTF16 but UTF8 instead, or MAX_STRING_SIZE was set to EXTENDED.

That is why I think the recent re-installation of Oracle on the build agent has lost a setting.

So we can fix the remaining failing test here by re-configuring Oracle on TeamCity.

But by the way, that also means the Oracle dialects maximal sizing of string is not very good: depending on how the database is actually configured, they may be too large or too small. Since there is no fixed setting for them, I guess it should be left to the user to define them properly if the NHibernate defaults causes issues. The only way for this is to currently derive a custom dialect from Oracle and override RegisterCharacterTypeMappings. If only casting (like in current failing test case) causes issues, configuring the DefaultCastLength works too.

fredericDelaporte · 2018-01-04T20:51:02Z

That is why I think the recent re-installation of Oracle on the build agent has lost a setting.

Well, bad conclusion. Another change has also occurred: #709, changing the default cast length from 255 to 4000. That is the cause of this test failure, once gone back to nvarchar2 instead of varchar2.

So the simplest is to reduce DefaultCastLength to 2000 for Oracle tests. The configuration of MAX_STRING_SIZE is quite cumbersome.

fredericDelaporte · 2018-01-07T08:13:15Z

Rebased, some more comments added, Oracle config templates completed. I have done it in such a way that the builds have been run on the first commit, for demonstrating the tests are not properly setup for Unicode support.

hazzik · 2018-01-30T23:41:36Z

LGTM. Please resolve the conflict and merge.

fredericDelaporte · 2018-01-31T00:16:02Z

Rebased.

fredericDelaporte added c: Tests p: Lowest t: Bug labels Jan 2, 2018

fredericDelaporte commented Jan 2, 2018

View reviewed changes

fredericDelaporte force-pushed the Unicode branch from 0db37d8 to 8e046fe Compare January 6, 2018 15:14

fredericDelaporte requested a review from hazzik January 10, 2018 12:25

hazzik previously approved these changes Jan 30, 2018

View reviewed changes

fredericDelaporte added 3 commits January 31, 2018 01:10

Test Unicode string.

cf998d1

Support Unicode in Firebird tests

c03ea01

Support Unicode in Oracle tests

002ee91

fredericDelaporte dismissed hazzik’s stale review via 002ee91 January 31, 2018 00:15

fredericDelaporte force-pushed the Unicode branch from 3ba71ee to 002ee91 Compare January 31, 2018 00:15

fredericDelaporte added this to the 5.1 milestone Jan 31, 2018

hazzik approved these changes Jan 31, 2018

View reviewed changes

fredericDelaporte merged commit 59ff1df into nhibernate:master Jan 31, 2018

fredericDelaporte deleted the Unicode branch January 31, 2018 10:12

fredericDelaporte added the r: Fixed label Jan 31, 2018

fredericDelaporte mentioned this pull request Nov 9, 2018

Garbled when I insert entity data with chinese characters #1902

Closed

This was referenced Jul 12, 2021

Add Oracle to GitHub Actions #2848

Merged

Investigate why TeamCity Oracle builds so slow #2854

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test Unicode string. #1511

Test Unicode string. #1511

fredericDelaporte commented Jan 2, 2018 •

edited

Loading

fredericDelaporte Jan 2, 2018

fredericDelaporte Jan 2, 2018 •

edited

Loading

fredericDelaporte Jan 2, 2018

fredericDelaporte Jan 2, 2018

fredericDelaporte commented Jan 4, 2018 •

edited

Loading

fredericDelaporte commented Jan 4, 2018

fredericDelaporte commented Jan 7, 2018

hazzik commented Jan 30, 2018

fredericDelaporte commented Jan 31, 2018

Test Unicode string. #1511

Test Unicode string. #1511

Conversation

fredericDelaporte commented Jan 2, 2018 • edited Loading

fredericDelaporte Jan 2, 2018

Choose a reason for hiding this comment

fredericDelaporte Jan 2, 2018 • edited Loading

Choose a reason for hiding this comment

fredericDelaporte Jan 2, 2018

Choose a reason for hiding this comment

fredericDelaporte Jan 2, 2018

Choose a reason for hiding this comment

fredericDelaporte commented Jan 4, 2018 • edited Loading

fredericDelaporte commented Jan 4, 2018

fredericDelaporte commented Jan 7, 2018

hazzik commented Jan 30, 2018

fredericDelaporte commented Jan 31, 2018

fredericDelaporte commented Jan 2, 2018 •

edited

Loading

fredericDelaporte Jan 2, 2018 •

edited

Loading

fredericDelaporte commented Jan 4, 2018 •

edited

Loading