UTF-8 support

Post Reply
Stepan Karandin
Posts: 17
Joined: 24 Jul 2017, 11:41

Hi, folks

We've done with XML preparation. But the problem is Cyrillic types in it.

According to
https://www.enterprise-architecture.org ... CII#p29234
importEssentialInstances.xsl and standardFunctions.py should be updated.

At the last version available (6.1.1) the versions are older than Jonathan posted before.

Could somebody publish version with UTF-8 issue fixed, please?
Stepan Karandin
Posts: 17
Joined: 24 Jul 2017, 11:41

Some update.

Tomcat should be confirured properly to get UTF-8 encoded XML

essential_import_utility/WEB-INF/web.xml

Code: Select all

    <filter>
        <filter-name>setCharacterEncodingFilter</filter-name>
        <filter-class>org.apache.catalina.filters.SetCharacterEncodingFilter</filter-class>
        <init-param>
            <param-name>encoding</param-name>
            <param-value>UTF-8</param-value>
        </init-param>
        <async-supported>true</async-supported>
    </filter>


<filter-mapping>
<filter-name>setCharacterEncodingFilter</filter-name>
<url-pattern>/*</url-pattern>
</filter-mapping>
So, now we have wellformed XML at the service but Python is screwing up.

Code: Select all

Source information transformed successfully.
Created new instance of class: Project_Activity
Updated instance via external repository reference, on class: Project_Activity
.....
Updated instance via external repository reference, on class: Project_Activity

Script Exception:javax.script.ScriptException: Traceback (innermost last):
  (no code object) at line 0
SyntaxError: ('invalid syntax', ('<unknown>', 1, 62, 'theInstance = EssentialGet                                                                                                                     Instance("Project_Activity", "", ""5AB8@>20=85 8=B5@D59A0 "@01;HCB8=30", "", "***")'))
        at com.sun.script.jython.JythonScriptEngine.compileScript(JythonScriptEngine.java:280)
        at com.sun.script.jython.JythonScriptEngine.eval(JythonScriptEngine.java:169)
Both .py updated:

Code: Select all

# coding: utf8
So, how could I debug it? It looks like Python working in Unicode not UTF.
User avatar
jonathan.carter
Posts: 1087
Joined: 04 Feb 2009, 15:44

Hi Stepan,

All of our ‘supporting’ libraries, e.g. there’s one called ‘standardFunctions.py’ already have the UTF-8 encoding statement at the top of the script file.

Indeed, we have been importing Chinese and Arabic for many versions of Import Utility via the Excel interface, which depend on the same scripts etc.

You should not need to make any changes to the web.xml file to support UTF-8.

The tricky part is making sure that you send in your UTF-8 characters into the XML in the ‘correct’ format.

Firstly, when passing string values into the Python operations of standardFunctions.py in the XSL template, you need to prefix each Cyrillic string with u’’ to tell Python that this is a Unicode string.

e.g. “Essential” is passed in as u’Essential’. You may need to use the Unicode character codes - \u4E2D - but you can get the XSL to handle that for you.

From your last code snippet, it looks like the characters are not being interpreted correctly by Python. Try sending them in as \uNNNN statements and watch out for character-escaping - in particular watch out for the need to use \\uNNNN in your source XML if it’s in double-quotes as \uNNNN does some crazy partial-escape. Certainly, this tripped me up just this week!

Jonathan
Essential Project Team
Post Reply