Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test Unicode string. #1511

Merged
merged 3 commits into from
Jan 31, 2018
Merged

Conversation

fredericDelaporte
Copy link
Member

@fredericDelaporte fredericDelaporte commented Jan 2, 2018

The tests do not check Unicode string, here is a basic test for this.
I expect it to fail for Firebird and Oracle. I will add test fix commits (and async regen) afterward.

[Test]
public void InsertUnicodeValue()
{
const string unicode = "길동 최고 新闻 地图 ます プル éèêëîïôöõàâäåãçùûü бджзй αβ ខគឃ ضذخ";
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is some characters taken from NHCH-43, then some European accentuated letters, some Cyrillic letters, two Greek ones, a bunch of Khmer ones (likely the script supported the latest on windows) and some Arabic ones (which is in my experience one of the worst to support due to right-to-left logic, but that is more a trouble for display than for storage; Khmer was easier to deal with even when not yet officially supported on Windows).

@@ -24,5 +24,6 @@
<property name="adonet.wrap_result_sets">false</property>

<property name="odbc.explicit_datetime_scale"></property>
<property name="oracle.use_n_prefixed_types_for_unicode"></property>
Copy link
Member Author

@fredericDelaporte fredericDelaporte Jan 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The TeamCity Nant build logic just does nothing of properties to adjust when they are lacking in this file, thus causing Oracle to not be properly set up for actually using Unicode with DbType.String type. On our TeamCity agent, its default charset is an ascii one, and so we must use the "N" prefixed char types for storing Unicode strings.

@@ -22,6 +22,7 @@ for your own use before compile tests in VisualStudio.
Database=nhibernate;
User ID=SYSDBA;Password=masterkey;
MaxPoolSize=200;
charset=utf8;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Default charset for FireBird is "None", theoretically meaning that it is the application business to encode/decode it, Firebird being supposed to stores and gives data back "as is", maybe in some binary sens. In our case, that goes wrong. I have not investigated why, I have just switched that to utf8 instead.

This requires to recreate the database, which the TeamCity build does for Firebird. For local tests, adjust your configuration files accordingly and run the TestDatabaseSetup test for fixing this.

(Of course this file here is just the default template, the actual fix for TeamCity is the similar change done in teamcity.build file.)

@@ -14,5 +14,6 @@ for your own use before compile tests in VisualStudio.
<property name="show_sql">false</property>
<property name="dialect">NHibernate.Dialect.Oracle10gDialect</property>
<property name="query.substitutions">true 1, false 0, yes 'Y', no 'N'</property>
<property name="oracle.use_n_prefixed_types_for_unicode">false</property>
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For helping those installing Oracle with an non-Unicode default charset, I have added this parameter in the sample file. So they will just need to switch it to true, as explained in the instruction file in the same folder.

@fredericDelaporte
Copy link
Member Author

fredericDelaporte commented Jan 4, 2018

@hazzik, the remaining Oracle failure is a new issue. A cast to nvarchar2(4000) fails with ORA-00910, meaning the test has tried to oversize the string type. I think that when Oracle was recently re-installed on TeamCity, it has lost a setting: extended string support.

I am going to enable it.

A bit of background for explaining why I think this:

When NH-4062 (Properly handle Oracle Unicode support dual model) was done, the TeamCity build was not properly changed for correctly setting oracle.use_n_prefixed_types_for_unicode. The TeamCity NHibernate agent needs it since its Oracle is configured "the old default way", with an ASCII charset. The lack of this parameter results in all string types using varchar2 instead of nvarchar2, which on TeamCity's Oracle current setup, means using an non Unicode string type. Thus the failure of the new test I have added here. But since no previous tests were actually requiring Unicode, this has gone unnoticed, causing the remaining failing test here to use a varchar2(4000).

Before NH-4062, NHibernate Oracle dialect was assuming nvarchar2 has to be always used for Unicode string, which is wrong with modern setup and not recommended. (See by example here.) So before NH-4062, the failing test here was already using a nvarchar2(4000), but without triggering the ORA-00910. This is only possible under two cases: either the NLS_NCHAR_CHARACTERSET was not the default UTF16 but UTF8 instead, or MAX_STRING_SIZE was set to EXTENDED.

That is why I think the recent re-installation of Oracle on the build agent has lost a setting.

So we can fix the remaining failing test here by re-configuring Oracle on TeamCity.

But by the way, that also means the Oracle dialects maximal sizing of string is not very good: depending on how the database is actually configured, they may be too large or too small. Since there is no fixed setting for them, I guess it should be left to the user to define them properly if the NHibernate defaults causes issues. The only way for this is to currently derive a custom dialect from Oracle and override RegisterCharacterTypeMappings. If only casting (like in current failing test case) causes issues, configuring the DefaultCastLength works too.

@fredericDelaporte
Copy link
Member Author

That is why I think the recent re-installation of Oracle on the build agent has lost a setting.

Well, bad conclusion. Another change has also occurred: #709, changing the default cast length from 255 to 4000. That is the cause of this test failure, once gone back to nvarchar2 instead of varchar2.

So the simplest is to reduce DefaultCastLength to 2000 for Oracle tests. The configuration of MAX_STRING_SIZE is quite cumbersome.

@fredericDelaporte
Copy link
Member Author

Rebased, some more comments added, Oracle config templates completed. I have done it in such a way that the builds have been run on the first commit, for demonstrating the tests are not properly setup for Unicode support.

hazzik
hazzik previously approved these changes Jan 30, 2018
@hazzik
Copy link
Member

hazzik commented Jan 30, 2018

LGTM. Please resolve the conflict and merge.

@fredericDelaporte
Copy link
Member Author

Rebased.

@fredericDelaporte fredericDelaporte added this to the 5.1 milestone Jan 31, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants