Performance of import utility 1.7.1c

Post Reply
mdpremkumar
Posts: 26
Joined: 17 Apr 2017, 09:06

Hi,

We are currently in the process of populating EA model with data from service now.
To check the performance of import utility, initially we are trying to populate data from Software table to Technology_Product. (For testing purposes, just the name, description and manufacturer). The name of manufacturer will be created as a supplier instance. Using service now data (totally 36000 rows and unique 2600 supplier names) we have formed XML file which is of size 36 MB. The XML has a simple_instance node for each supplier as follows.

<simple_instance>
<name>A supplier name</name>
<type>Supplier</type>
</simple_instance>

LIke this, for each technology product a simple_instance node is created as follows.
<simple_instance>
<name>A product name</name>
<type>Technology_Product</type>
<own_slot_value>
<slot_reference>name</slot_reference>
<value value_type="string">A product name</value>
</own_slot_value>
<own_slot_value>
<slot_reference>description</slot_reference>
<value value_type="string">Some description goes here</value>
</own_slot_value>
<own_slot_value>
<slot_reference>product_label</slot_reference>
<value value_type="string">A supplier name::A product name</value>
</own_slot_value>
<own_slot_value>
<slot_reference>supplier_technology_product</slot_reference>
<value value_type="simple_instance">A supplier name</value>
</own_slot_value>
</simple_instance>.

Once XML is generated, it is posted by our program to import utility in order to trigger an XML activity. To transform this XML to Python script, we make use of testtransform.xsl with some customizations. We have noticed following issues.

Problem #1 with unicode values.
==========================
Some of our data have unicode values. For example, there is a manufacturer name as follows
Компания КРИПТО-ПРО

If you try to print this name in the script console of protege as follows,
print "Компания КРИПТО-ПРО"
a lexical error is reported. Using single quote instead of double quotes will properly execute, but the XSL will use double quotes when forming python script. Hence import utility breaks at this point.

To cross check this issue, we created an excel import activity using an excel file having only this particular unicode value and mapped the column containing this value to create an instance of Supplier. When the activity was triggered, the supplier name got created properly. On examining generated python script, we noticed that only unicode data have been converted to their unicode entities and passed to python script as u"\u041A\u043E\u043C\u043F\u0430\u043D\u0438\u044F \u041A\u0420\u0418\u041F\u0422\u041E-\u041F\u0420\u041E". Hence we adjusted our program that forms XML to perform a check if incoming text is unicode. If so it will be converted to its corresponding entities. After this we also adjusted XSL code that called standardFunctions.py methods, added # -*- coding: UTF-8 -*-
# and then unicode issue was resolved.

We like to know whether this approach is a proper way or other best practices exist. If so please suggest the best way.

Problem #2 slow import process.
===========================
After resolving unicode issue, we tested with few rows containing unicode values and everything worked fine as expected. Hence we wanted to test importing all 2600 suppliers and 36000 products. In our dev environment, we carried our all testing on a protege file based snapshot of existing database, server based protege project. Import utility uses an rmi connection to this project. Dev system has enough memory and cpu utilization is also fine. Also nobody else is accessing anything related to EA from that server.

We triggered import activity using our program and the process started around 6:30 AM PST on yesterday (7/13). In around 3 hours, all supplier instances and first 10000 products got created. But the import utility is still running (more than 24 hours) and as of now 23000 products have been created. Is this due to using protege file mode project? Anyways, once this process completes we will also cross check this against database based server project. Is this usual like import utility will take this much of time? Please suggest if there is some better way to increase performance.

Expecting some suggestions from the community on the following
=====================================================
Hope somebody should have already tried importing data from ServiceNow CMDB to EA meta model. If so request you to share mapping between CMDB tables and EA classes

Thanks.
Prem
User avatar
jonathan.carter
Posts: 1087
Joined: 04 Feb 2009, 15:44

Hi Prem,

Thanks for this feedback.

On your issues with the unicode characters sets, you have done precisely the correct thing.
The sample transform on which you have based your XML import is quite basic and is not set up to handle things like unicode. However, the ‘standardFunctions.py’ is shared between both the Excel and XML import and does provide the operations that you need. That special first line:

Code: Select all

# -*- coding: UTF-8 -*-
is very important to force Python into operating in unicode mode.

On Problem #2, we have discussed this further in other posts. Following on from one of those, we are looking at some strategies for performance improvements for this. There is a lot of work being done to import 36000 products into the repository, normally multiple operations per simple_instance tag.

I have personally some good experience of importing from another CMDB (not ServiceNow) into Essential but the volumes were significantly lower.

A question for you: Are you importing the definitions of ALL the products that ServiceNow have defined or just those products that are in use at your organisation?

Jonathan
Essential Project Team
mdpremkumar
Posts: 26
Joined: 17 Apr 2017, 09:06

Hi Jonathan,

Thank you so much for the reply.

To your question,
A question for you: Are you importing the definitions of ALL the products that ServiceNow have defined or just those products that are in use at your organisation?
As the first step we tried to populate only software technology products from a table namely cmdb_ci_spkg. It is actually having around 130000 entries, but we filtered 36000 records. We are in the discussions like filtering data which we actually require. But in reality, we may have large volume of data from several other tables and hence initially we tested with 36000 records.

One more point is that, during our sync process, we need to populate data into EA classes like Technology_Product, Supplier, Technology_Node, Application_Provider, etc. For this approach, is your suggestion to make use of one large XML file having all these data and then trigger XML activity or instead have one XML for each target class and trigger import activity one by one?

if the suggestion is to make use of individual XML files, for example, if we have two XML files one for Technology_Product and the other for Technology_Node and post these XML files to import activity using consecutive REST API calls, does the utility process both of them in parallel or one by one?

In the API, I have seen that there is a parameter for response callback were we can receive the results. But I am not sure how this parameter could be set since we are making use of a standalone C# program to call REST APIs. if you have some hints for this please share. From my side also I will try to figure out how to use callback URL and if that works properly, we can initiate a REST API call once prior call has completed its execution. This way I think, we can reduce overload to import utility.

Thanks.
Prem
User avatar
jonathan.carter
Posts: 1087
Joined: 04 Feb 2009, 15:44

Hi Prem,

Thanks for giving us some more background about what you’re aiming for with the ServiceNow content. It sounds like this could be done as a single once-off initial load and then ongoing updates of changes.

In terms of updating your repository content for Technology Products, Suppliers, etc. either approach - bring them all in together or have separate source XML - is fine. I’ve certainly done imports from a CMDB that combines all of those classes in a single XML.

Performance-wise, it may be worth noting that whilst the XML service interface in the Import Utility can receive multiple requests, each request is queued to manage the load on the Import Utility server. This also ensures that there are no cross-talk issues from running imports in parallel.

If you are running separate imports for each type, the Import Utility framework will create instances, if required, for elements that will be fully-imported in another import. e.g. If I import a set of Technology Products and want to associate these with a Technology Node that will be imported later, the Product import will create the Node - with just its name defined. When the Technology Node import runs, that new Node will be picked up, updated and populated with the complete content from the Node import.

The response callback is a little bit like a WebHook and the Import Utility will perform an HTTP POST to the URL that you specify in that parameter of the request to execute the import. It won’t matter what technology you use to build / configure that HTTP endpoint and the documentation describes the XML document that will be POSTed to your URL. Some people use an ESB or integration platform on which they can quickly define an HTTP adapter that writes this XML, e.g. to a log or similar.

In terms of load on the server, it queues requests and runs each import individually, in series, so there should be no need to wait for a success response before submitting another request.

If you haven't already got a copy of the Interface Specification for the Import Utility, you can download it here.

Jonathan
Essential Project Team
mdpremkumar
Posts: 26
Joined: 17 Apr 2017, 09:06

Hi Jonathan,

Thank you so much for all the details you have provided.
I too noticed that on posting two consecutive requests to import utility, it is processing one by one.
Thanks for the link to the latest interface documentation.

Cheers,
Prem
Post Reply