Performance of import utility 1.7.1c
Posted: 14 Jul 2017, 18:26
Hi,
We are currently in the process of populating EA model with data from service now.
To check the performance of import utility, initially we are trying to populate data from Software table to Technology_Product. (For testing purposes, just the name, description and manufacturer). The name of manufacturer will be created as a supplier instance. Using service now data (totally 36000 rows and unique 2600 supplier names) we have formed XML file which is of size 36 MB. The XML has a simple_instance node for each supplier as follows.
<simple_instance>
<name>A supplier name</name>
<type>Supplier</type>
</simple_instance>
LIke this, for each technology product a simple_instance node is created as follows.
<simple_instance>
<name>A product name</name>
<type>Technology_Product</type>
<own_slot_value>
<slot_reference>name</slot_reference>
<value value_type="string">A product name</value>
</own_slot_value>
<own_slot_value>
<slot_reference>description</slot_reference>
<value value_type="string">Some description goes here</value>
</own_slot_value>
<own_slot_value>
<slot_reference>product_label</slot_reference>
<value value_type="string">A supplier name::A product name</value>
</own_slot_value>
<own_slot_value>
<slot_reference>supplier_technology_product</slot_reference>
<value value_type="simple_instance">A supplier name</value>
</own_slot_value>
</simple_instance>.
Once XML is generated, it is posted by our program to import utility in order to trigger an XML activity. To transform this XML to Python script, we make use of testtransform.xsl with some customizations. We have noticed following issues.
Problem #1 with unicode values.
==========================
Some of our data have unicode values. For example, there is a manufacturer name as follows
Компания КРИПТО-ПРО
If you try to print this name in the script console of protege as follows,
print "Компания КРИПТО-ПРО"
a lexical error is reported. Using single quote instead of double quotes will properly execute, but the XSL will use double quotes when forming python script. Hence import utility breaks at this point.
To cross check this issue, we created an excel import activity using an excel file having only this particular unicode value and mapped the column containing this value to create an instance of Supplier. When the activity was triggered, the supplier name got created properly. On examining generated python script, we noticed that only unicode data have been converted to their unicode entities and passed to python script as u"\u041A\u043E\u043C\u043F\u0430\u043D\u0438\u044F \u041A\u0420\u0418\u041F\u0422\u041E-\u041F\u0420\u041E". Hence we adjusted our program that forms XML to perform a check if incoming text is unicode. If so it will be converted to its corresponding entities. After this we also adjusted XSL code that called standardFunctions.py methods, added # -*- coding: UTF-8 -*-
# and then unicode issue was resolved.
We like to know whether this approach is a proper way or other best practices exist. If so please suggest the best way.
Problem #2 slow import process.
===========================
After resolving unicode issue, we tested with few rows containing unicode values and everything worked fine as expected. Hence we wanted to test importing all 2600 suppliers and 36000 products. In our dev environment, we carried our all testing on a protege file based snapshot of existing database, server based protege project. Import utility uses an rmi connection to this project. Dev system has enough memory and cpu utilization is also fine. Also nobody else is accessing anything related to EA from that server.
We triggered import activity using our program and the process started around 6:30 AM PST on yesterday (7/13). In around 3 hours, all supplier instances and first 10000 products got created. But the import utility is still running (more than 24 hours) and as of now 23000 products have been created. Is this due to using protege file mode project? Anyways, once this process completes we will also cross check this against database based server project. Is this usual like import utility will take this much of time? Please suggest if there is some better way to increase performance.
Expecting some suggestions from the community on the following
=====================================================
Hope somebody should have already tried importing data from ServiceNow CMDB to EA meta model. If so request you to share mapping between CMDB tables and EA classes
Thanks.
Prem
We are currently in the process of populating EA model with data from service now.
To check the performance of import utility, initially we are trying to populate data from Software table to Technology_Product. (For testing purposes, just the name, description and manufacturer). The name of manufacturer will be created as a supplier instance. Using service now data (totally 36000 rows and unique 2600 supplier names) we have formed XML file which is of size 36 MB. The XML has a simple_instance node for each supplier as follows.
<simple_instance>
<name>A supplier name</name>
<type>Supplier</type>
</simple_instance>
LIke this, for each technology product a simple_instance node is created as follows.
<simple_instance>
<name>A product name</name>
<type>Technology_Product</type>
<own_slot_value>
<slot_reference>name</slot_reference>
<value value_type="string">A product name</value>
</own_slot_value>
<own_slot_value>
<slot_reference>description</slot_reference>
<value value_type="string">Some description goes here</value>
</own_slot_value>
<own_slot_value>
<slot_reference>product_label</slot_reference>
<value value_type="string">A supplier name::A product name</value>
</own_slot_value>
<own_slot_value>
<slot_reference>supplier_technology_product</slot_reference>
<value value_type="simple_instance">A supplier name</value>
</own_slot_value>
</simple_instance>.
Once XML is generated, it is posted by our program to import utility in order to trigger an XML activity. To transform this XML to Python script, we make use of testtransform.xsl with some customizations. We have noticed following issues.
Problem #1 with unicode values.
==========================
Some of our data have unicode values. For example, there is a manufacturer name as follows
Компания КРИПТО-ПРО
If you try to print this name in the script console of protege as follows,
print "Компания КРИПТО-ПРО"
a lexical error is reported. Using single quote instead of double quotes will properly execute, but the XSL will use double quotes when forming python script. Hence import utility breaks at this point.
To cross check this issue, we created an excel import activity using an excel file having only this particular unicode value and mapped the column containing this value to create an instance of Supplier. When the activity was triggered, the supplier name got created properly. On examining generated python script, we noticed that only unicode data have been converted to their unicode entities and passed to python script as u"\u041A\u043E\u043C\u043F\u0430\u043D\u0438\u044F \u041A\u0420\u0418\u041F\u0422\u041E-\u041F\u0420\u041E". Hence we adjusted our program that forms XML to perform a check if incoming text is unicode. If so it will be converted to its corresponding entities. After this we also adjusted XSL code that called standardFunctions.py methods, added # -*- coding: UTF-8 -*-
# and then unicode issue was resolved.
We like to know whether this approach is a proper way or other best practices exist. If so please suggest the best way.
Problem #2 slow import process.
===========================
After resolving unicode issue, we tested with few rows containing unicode values and everything worked fine as expected. Hence we wanted to test importing all 2600 suppliers and 36000 products. In our dev environment, we carried our all testing on a protege file based snapshot of existing database, server based protege project. Import utility uses an rmi connection to this project. Dev system has enough memory and cpu utilization is also fine. Also nobody else is accessing anything related to EA from that server.
We triggered import activity using our program and the process started around 6:30 AM PST on yesterday (7/13). In around 3 hours, all supplier instances and first 10000 products got created. But the import utility is still running (more than 24 hours) and as of now 23000 products have been created. Is this due to using protege file mode project? Anyways, once this process completes we will also cross check this against database based server project. Is this usual like import utility will take this much of time? Please suggest if there is some better way to increase performance.
Expecting some suggestions from the community on the following
=====================================================
Hope somebody should have already tried importing data from ServiceNow CMDB to EA meta model. If so request you to share mapping between CMDB tables and EA classes
Thanks.
Prem