Is adding relations in source xml files possible?

Post Reply
peter van mortsel
Posts: 8
Joined: 13 Dec 2010, 12:31

Hi,

I've setup Protégé 3.4.4 + EAM and I'm trying to import some data using xml files via the Essential Integration Tab.

I have created a XML file containing employees and a XML file containing organisational units. I have even managed to realise internal references inside the organisational units. So, I can implement fi that the unit "Social support" is a sub actor of the parent actor "HRM department" by means of the slot_reference type "is_member_of_actor".

However when I try to make a link between an employee (Individual_Actor) and an organisational unit (Group_Actor), I run into trouble. I have tried using the name of the simple_instance and the name slot_reference of an organisational unit, using the slot_reference "is_member_of_actor", to make a link between an employee and a unit he or she is working for, but then the import fails. Can these kind of links only be made manualy using the forms GUI or is there a way to tell the XML file a specific Individual_Actor is linked to a specific Group_Actor.

I guess there is a third candidate key, the internal name, to make the link, but I don't see a way to put that value inside my xml source files, that are based on and not known inside our HRM database.

So, is the xml-based linking possible. If not ... are there better approaches to import the data and especially the relations between the classinstances?

Thanks in advance,
Kind regards,
Peter.
User avatar
jonathan.carter
Posts: 1087
Joined: 04 Feb 2009, 15:44

Hi Peter,

Apologies for not replying sooner.

Yes, it is certainly possible to create such relationships during an import, both direct relationships like the one that you are trying to create and relationships that involve instances of relationship classes (e.g. ACTOR_TO_ROLE_RELATION).

Without knowing more about the structure of the XML that you have created, I think it sounds like you are using the name of the Group Actor rather than its instance ID when defining that relationship.

All relationships in Protege are managed via the instance IDs rather than the names and this is a key foundation for how the repository works. However, this does make things like integration a little more complex and this is one of the reasons that the integration tab needs some form of transform to use a library of integration functions (that we have created for Essential) that take care of this sort of thing for you.

Some - perhaps most - people find that the most straightforward approach to importing XML is to create their source XML, using the structure of the reportXML.xml Essential Viewer snapshot document - in particular if you are in control of creating this XML document.
If you can create your XML according to this schema (Essential XML schema) then you can use the supplied 'importEssentialInstances.xsl' transform to import your XML.

If you are taking this approach, you need to make sure that you provide a <simple_instance> tag for each Actor (Group / Individual) that you need to import. In this <simple_instance> tag, the simple_instance/name tag is the instance ID. Make sure that each ID is unique across your source repository (typically, your HR system or equivalent repository will have a unique ID for each that you can use). This is the ID that you should then have in the simple_instance/own_slot_value/slot_reference tag in the following structure:
<simple_instance>
....
<own_slot_value>
<slot_reference>is_member_of_actor</slot_reference>
<value value_type="simple_instance">YOUR_GROUP_ACTOR_ID</value>
</own_slot_value>
...
where YOUR_GROUP_ACTOR_ID is the unique ID for the group that you want an individual to be a member of - it can be anything as long as it is unique.

The alternative approach is to write your own transform. With this approach you are transforming your source XML into a Python integration script that the Integration Tab will execute. This sounds very complex but in fact, your Python consists of calls to the integration functions I mentioned above. This approach provides a lot of flexibility for transforming source XML with more complex processing during the integration - particularly useful for importing or creating relationships during the integration.

If you look at the source code for the 'importEssentialInstances.xsl' and have a look at the Introductory Guide to creating integration transforms, you'll get a better idea of what's involved in creating your own transform (if you have not already done so).

I'll leave this post here for now. Let me know more about your source XML (are you in control of its structure etc.) and the transform that you are using and we can take it from there.

Jonathan
Essential Project Team
peter van mortsel
Posts: 8
Joined: 13 Dec 2010, 12:31

Hi Jonathan,

Thanks for picking up my post.

Indead I use the Essential XML schema to import the data and I'm using the integration tab referencing importEssentialInstances.xsl to tranform and the include direcotry where standardFunctions.txt is located.

I have put togheter some screenshots in one file.
I hope they make really clear what I'm trying to do ...

First I have made a xml file storing organisational units and a python script to create the class PsEntiteiten, being a subclass of Group_Actor.
After executing the script via the script console tab, I can import the xml via the Essential Integration tab without problems.

Secondly I have made a xml file storing employees and a python script to create the class PxEmployees, being a subclass of Individual_Actor. In this class I have a slot value that refers to the instance name in the first file.

If I create a string slot for this (foreign key) field in Pxemployees I can import the second xml file without issues. Of course, then I have no relations between the instances of both classes.

However, when I use slot reference "is_member_of_actor" with value type "simple instance" I get the error output below.

The screenshots shows concrete that I want to link my personal employee data to the entity I work for being PS_ENT_0383, which is the name of the simple instance, not the value of the name slot in the instance which is DLOG_DICT_...

So, I have no idea at this moment.

Thanks for watching my issue.
Kind regards
Peter.


Starting... Initialised Integration Engine
Source information transformed successfully.
Created new instance: PS_EMPL_Roothans Peter (016606_0), Essential name: Roothans Peter (016606_0)
Script Exception:
javax.script.ScriptException: null
at com.sun.script.jython.JythonScriptEngine.evalCode(JythonScriptEngine.java:292)
at com.sun.script.jython.JythonScriptEngine.eval(JythonScriptEngine.java:170)
at javax.script.AbstractScriptEngine.eval(AbstractScriptEngine.java:76)
at com.enterprise_architecture.essential.scripting.ScriptJob.execute(ScriptJob.java:214)
at com.enterprise_architecture.essential.integration.core.IntegrationEngine.processIntegrationScript(IntegrationEngine.java:422)
at com.enterprise_architecture.essential.integration.core.IntegrationEngine.execute(IntegrationEngine.java:237)
at com.enterprise_architecture.essential.integration.widgets.EssentialIntegrationTab$2.construct(EssentialIntegrationTab.java:771)
at com.enterprise_architecture.essential.integration.widgets.SwingWorker$2.run(SwingWorker.java:110)
at java.lang.Thread.run(Unknown Source)
Caused by: Traceback (innermost last):
File "<unknown>", line 20, in ?
File "\\storage03\paupero$\myprojects\protege\Applicationdata/standardFunctions.txt", line 159, in getEssentialInstanceContains
AttributeError: 'NoneType' object has no attribute 'getDirectInstances'

at org.python.core.Py.AttributeError(Unknown Source)
at org.python.core.PyObject.noAttributeError(Unknown Source)
at org.python.core.PyObject.__getattr__(Unknown Source)
at org.python.core.PyObject.invoke(Unknown Source)
at org.python.pycode._pyx1.getEssentialInstanceContains$10(\\storage03\paupero$\myprojects\protege\Applicationdata/standardFunctions.txt:159)
at org.python.pycode._pyx1.call_function(\\storage03\paupero$\myprojects\protege\Applicationdata/standardFunctions.txt)
at org.python.core.PyTableCode.call(Unknown Source)
at org.python.core.PyTableCode.call(Unknown Source)
at org.python.core.PyFunction.__call__(Unknown Source)
at org.python.core.PyObject.__call__(Unknown Source)
at org.python.pycode._pyx0.f$0(<unknown>:20)
at org.python.pycode._pyx0.call_function(<unknown>)
at org.python.core.PyTableCode.call(Unknown Source)
at org.python.core.PyCode.call(Unknown Source)
at org.python.core.Py.runCode(Unknown Source)
at com.sun.script.jython.JythonScriptEngine.evalCode(JythonScriptEngine.java:289)
... 8 more
javax.script.ScriptException: null
Error: An error or exception occurred during the execution of the integration script. See output for more information.
EAMimport.png
You do not have the required permissions to view the files attached to this post.
User avatar
jonathan.carter
Posts: 1087
Joined: 04 Feb 2009, 15:44

Hi Peter,

The error at the heart of this is the line:
File "\\storage03\paupero$\myprojects\protege\Applicationdata/standardFunctions.txt", line 159, in getEssentialInstanceContains
AttributeError: 'NoneType' object has no attribute 'getDirectInstances'
This is basically just like a Java null pointer exception.
What's going on is that the getEssentialInstanceContains() - which is used to find a specific instance in the repository OR create it if it is not found - is attempting on line 159 to find all the instances of the class that is specified in its first argument.

However, the class that's been specified cannot be found in the repository. It's important to note that the integration tab does not create new classes - only instances of 'known' classes.

We can see from the output, that it has created a new instance:
PS_EMPL_Roothans Peter (016606_0) but fails on the next instance to import.
I notice that you've extended the meta model under the Group_Actor meta class. Check that the class names in your source XML (the <type> tags) match the class names in your extended meta model and that if this next instance is of another extended meta class type that this class is defined in your target repository.

I think the key points, here are:
  • The integration tab imports instances only, not classes. (We manage updates to the meta model separately via meta model scripts)
  • Your import is failing on the second instance in your source XML when it attempts to import an instance that is of an undefined type (class).
It's interesting that you have chosen to extend the Actor meta model class as you have. In fact, these types that you've defined as sub-classes of Group_Actor are often captured as instances of Roles or Role Types in the Logical and Conceptual layers respectively. Alternatively, a new Taxonomy capability is available as part of the Strategy Management Pack that is about to be released. These approaches enable you to classify Actors without having to extend the meta model.

Have a look at the instance after the PS_EMPL_Roothans Peter (016606_0) instance in the source XML and check the class name that is used in the <type> tag.

Let me know how you get on - and if you have any further problems, let me know.

Jonathan
Essential Project Team
peter van mortsel
Posts: 8
Joined: 13 Dec 2010, 12:31

Hi Jonathan,

Thanks for clearing things out.

In order to solve my problem I took the following steps ...

- I removed all the subclasses I made under Group_Actor and Individual_Actor
- I imported all our organisational units in the class Group_Actor using the integration tab
- This works fine, including hierarchical structures
- The xml screenshot illustrates unit PS_ENT_0381 (DLOG_DICT_APGI_APPL) is member of PS_ENT_0395 (DLOG_DICT_APGI)
- The GUI screenshot illustrates you can click to parent 0395 from within 0381, that's really great
- I removed all instances from the xml with employeedata except my own instance
- I import the xml with only my instance using the Essential Integration Tab
- cfr the xml at the right hand side

Unfortunately the same run time errors occurs.
My employee data is imported but the link to the entity I work for is not created.

Then ...

I changed the value in the is_member_of_actor slot reference using the following values
- PS_ENT_0381, being the instance name I used when importing the first xml file with organisational units
- 0381, being the name slot reference inside the instance
- essential_baseline_v1.2_Class20742, being the instance name created by Protégé (I guess)
- I found this in reportXML.Xml created using the Essential Architecture Reporting tab
- Peoplesoft::PS_ENT_0381, being the external repository instance reference

No matter what key I use, the link in my Individual_Actor employee data to my Group_Actor organisational unit is not created.

So I started wondering it is possible to create "Member of Parent Actor" relations between classes that are not subclasses
of each other, since Group_Actor and Individual_Actor share the same parent class, but are not subclasses of each other.

So I tried to create the link manually using the GUI. That works fine. I just click the "Add instance" button
of the "Is a Member of Parent Actor" field and I can link the (Group_Actor) entity 0381 to my (Individual_Actor) employee data.

I thought using my own subclasses might have caused the problem, but that doesn't seem to be the case,
since now I only used class predefined in the EAM model.

I should take some time to read the documentation and improving my knowledge regarding the model ... and I could start using
the classes Roles or role types as you suggest, but I wonder that will solve my issue ... since then I also need to be able
to create references between instances of different classes via XML import.

I hope with this additional information, you see mistakes in what I'm trying to do.
All suggestions welcome.

Thanks
Peter.
import_issue_20101222.png
You do not have the required permissions to view the files attached to this post.
peter van mortsel
Posts: 8
Joined: 13 Dec 2010, 12:31

Hi Jonathan,

Before writing my previous post I used two XML files. One with organisational units, one with employeedata.
Now I manually added my employee data instance at the end of the xml file with organisational units.

This seems to work. Now the the link to the entity PS_ENT_0381 from within my employeedata is recognized and
added without errors. So, I'll just have to find a way to easily merge the data I retrieve from our data sources
into one file.

So, now I have 3 questions left to solve.

1)
I'm still wondering referencing instances can be realised using two or more xml files (a seperate xml file for each class)?

2)
Is it possible to keep the repository up to date with automated import?
Let's say a employee retires or moves to another firm and is archived in the source system. Is there a way I can tell protégé the employee should be deleted using xml import? ... or do I need to remove all instances and import a new export from the source system?

3)
Finally I'll dig into reporting.
I wonder I can easily export the data into a relational database since I'd like to implement some dynamic (form based) reporting. Let's say I'd like users could ask a report of all employees older than 40 working in a specific organisational unit. So I'll have to supply a form in which users can select one or more organisational units and enter age ... or is this kind of filtering already possible with standard EAM reporting tools without developer assistance?

Thanks
Kind regards
Peter.
User avatar
jonathan.carter
Posts: 1087
Joined: 04 Feb 2009, 15:44

Hi Peter,

Thanks for posting back and apologies for the delay in getting back to you.
Glad to hear that you got your import working by combining the two XML files.

To answer your questions:

1) It is possible to use multiple source files to import information from multiple external sources. Any instance in the repository can have more than one external reference ID (its ID in an external source). However, each of these should be unique, so that the instance can be accurately identified. The import functions provided in the 'standardFunctions.txt' file in the Integration Tab plugin creates a unique ID by combining the name of the external source (you have to define an 'external repository' name for each source) with the instance ID in that source. NOTE: If you've been using the 'importEssentialInstances.xsl' transform to import your data, there's a recent update available to download.

So, having a separate XML file for each class is certainly possible but the integration tab works such that it can reference things that are in the repository. This means that if you are defining relationships between instances, you have to make sure that both instances are defined in the repository before relating them. However, Protege works in a really neat way that means that you can define a new instance with almost no details, relate it to another instance and then return to it and complete its definition. I use this approach in some of the meta model update scripts and the GetEssentialInstanceXXX() functions in 'standardFunctions.txt' works like this.

2) Yes, it is most certainly possible to keep the repository up to date with an automated import. The architecture of the integration Tab is such that it uses underlying components that could be used by e.g. a Servlet that is gathering the external information and then running the update via a remote connection. Note, however, that this would require the repository to be running in multi-user mode. For this reason, to make things easy to use in particular for stand-alone mode, we've designed the Integration Tab to run as a tab that must be manually invoked.

However, you've touched on an interesting point about removing instances from the repository. It is certainly possible to define an import that instead of creating instances, deletes them. The important issue here is that these must be known explicitly as instances that should be deleted. It is not safe to make any inferences or assumptions about removing instances just because the instance was not mentioned in the latest import.

If you have, e.g. an XML file of employees to remove, then you could safely automate this with an 'import' that deletes the specified instances. After it's run, some house keeping on the repository would be recommended to identify and remove any orphaned instances.

Each time an instance is imported, the timestamp of that import is recorded and I've used these in the past to do things like identify the instances of particular types that were not included in the last import. And from this, identified instances that are candidates for removal.

I think the bottom line is that you can take whichever approach makes most sense to you with the external information that you have but it is very important to be clear about any instance before automating any deletes.

3) I know that some users are reporting directly against the RDBMS that is holding the repository (database backend). This is rather convoluted, however, as this database is highly normalised and any queries need to navigate both Protege's meta-meta-model and the Essential Meta Model.

A simple approach that I've used in the past is to create CSV Views that can be explored in Excel or similar and do filtering etc.

We've also created Views that take parameters to do the sort of filtering that you describe. e.g. by creating a View that presents a the form to define the filter, which then invokes a parameterised View to provide the required View. In this way, there's a little more web-development to build the views but then no development at all for your users.

Finally, as the Viewer environment is working from a very straight-forward XML document itself, it is relatively easy to take the whole (or subsets) of the repository XML and import that into a suitable database for more SQL-like reporting.

Hope this helps

Jonathan
Essential Project Team
peter van mortsel
Posts: 8
Joined: 13 Dec 2010, 12:31

Hi Jonathan,

Best wishes for 2011 and thanks for the clear answers that help me moving on with the import.

regarding your answer 1 I have some question?
You state I could use seperate XML files.
So I could ...

1st - import all organisational units
2nd - import all employees
3thd- import all relations (the references to organisational unit from within employees)

Actually, since it's a one way relationship I think I can merge step 2 and 3 ...

1st - import all organisational units
2nd - import all employees and it's references to organisational units, since they surely already exist.

Then I still wonder, when I use seperate XML files ... how I can reference an organisational unit
from within the employee records.

This turned out to be easy when using one XML file ...

I created organisational unit with instance name PS_ENT_0383 and then I created employee with instance name PS_EMP_016606 referencing organisational unit PS_ENT_0383 as parent actor.

Unless I made a stupid mistake I do not see at this moment, this turned out not working when using two XML files.
What key should I use to reference an organisational unit inside an employee instance when first importing organisational units with one XML file and then employess with another XML file??

The screenshot might clearify the idea?
At the left hand side I have a organisational unit with instance name PS_EN_0383 which have been imported first.
The entry in "Contained sub Actors" is empty at that moment since I have not imported employees yet.
Then I import employess with a seperate file, so instance name PS_EMP_0166006 is imported and a reference is made
to PS_ENT_0383. This works when I merge organisational units and employees in one file, but it does not work
when I use 2 xml files ...

So, what key should I use in the second files with employees to reference organisational units??
The Actor Name "DLOG_DICT_APGI_INTF (0383)" or the external repository instance reference "Peoplesoft::PS_ENT_0383"
or the instance name I used during importing the first file being "PS_ENT_0383" or the unique ID generated by the
import (something like essential_baseline_v1.2_Class#####??

This does not seem to work, unless I made a stupid mistake that I do not see?
Or should I perhaps (something I have not tested yet) explicitly add the key (in casu PS_EN_0383) in the "External Reference Links" of the organisational units in order to be able to make the link from employee to organisational unit when importing the file with employee data?

Or do you mean with "you have to define an 'external repository' name for each source" I have to create an instance "PeopleSoft" or "ActiveDirectory" in a dedicated EAM class that I don't know yet and that I reference then in the field "Source Repository" in Integration Tab?

Do I also interprete well that you can add more than one external key to one instance. So I could add for actor Peter Roothans a reference to my ID in our HR system, being PS_EMP_016606 and in the same instance I could make a reference to my unique ID in our directory service or emailserver? Right?

Thanks
Kind regards
Peter.
import_issue_20110103.png
You do not have the required permissions to view the files attached to this post.
Kevin Campbell
Posts: 40
Joined: 13 Sep 2010, 20:26

Peter

I realise you're quite a long way down the XML path, but I wanted to expand on something mentioned earlier: you can also create Python scripts to perform integration and in fact I found this rather easier. The following example associates virtual machines with their tehcnology nodes and operating systems.

newnode = getEssentialInstance("Technology_Node","WCAS150","VM","WCAS150")
host = getEssentialInstance("Technology_Node","HCAS302.CHEMD.NET","VM","HCAS302.CHEMD.NET")
os = getEssentialInstance("Technology_Product","P8413","StandardsRepository","")
status = getEssentialInstance("Deployment_Status","live","StandardsRepository","")
osInstance = getEssentialInstance("Infrastructure_Software_Instance","os:WCAS150","VM","os:WCAS150")
setSlot(osInstance,"technology_instance_deployed_on_node",newnode)
setSlot(osInstance,"technology_instance_of",os)
setSlot(osInstance,"technology_instance_deployment_status",status)
osInstance.addOwnSlotValue(kb.getSlot("technology_instance_given_name"),"os:WCAS150")

The foundation to this is the get EssentialInstance class, define in standardFunctions.txt as:
def getEssentialInstance(theClassName, theExternalRef, theExternalRepository, theInstanceName)

On a separate note related to reporting: if you're interested I created a set of stored procedures that will automatically create and populate relational tables representing all the poplated class instances within the repository. For certain types of tabular listings we found this to be a quicker approach.

Kevin
peter van mortsel
Posts: 8
Joined: 13 Dec 2010, 12:31

Hi Kevin,

So based on your source data you generate a large python script (instead of xml) that you can execute using the Script tab ... or can you also execute the script via command line ... so it can be scheduled unattended? In that case I might consider rewriting the xml generation to python script generation. So, not, you see advantages using the python script instead of xml in combination with the integration tab?

Actually, I also just wrote some code to generate CREATE TABLE statements and INSERT statements based on classnames and slots in the reportXML.xml file, so we can use regular SQL and our standard reporting tools ... in case we should not be able to create everything we need with essential viewer. The table scripting and insert statments work fine except for the fields containing large strings, since by default I create columns with length of 200 characters. Perhaps I should check the length and if required integrate an alter column statement to change the width before executing the insert. Does your stored procedures cope with that? Being mainly busy with dba tasks, I am surely interested in how you did it? I used some .NET code to loop the nodes and generate the sql code.

Thx
Peter.
User avatar
jonathan.carter
Posts: 1087
Joined: 04 Feb 2009, 15:44

Hi Peter - and Kevin!

Kevin - thanks for your post.

Peter - I think the step that you might be missing is that the transform XSL that is specified in the Integration Tab is a transform that processes the source XML to produce the Python calls that Kevin talks about. The 'importEssentialInstances.xsl' file that is provided in the Integration Tab (download the latest version and have a look), provides a demonstration of how this is achieved.
The Integration Tab executes this transform on your source XML, building the Python script and then runs the Python script to import the instances.

Using the functions that Kevin's described, you can find (or create if they are not found) the instances that you need in order to build the relationships. The getEssentialInstance() functions are the key to this.

I agree with your approach to building the SQL database for reporting. The reportXML.xml file is a really nice schema for working with reporting - a lot more immediately clear than the underlying database backend schema - and that's one of the reasons that we went with XSL/XML-based reporting for most of our views.

Jonathan
Essential Project Team
peter van mortsel
Posts: 8
Joined: 13 Dec 2010, 12:31

Thanks Jonathan,

I havn't analyzed the Python Script. I indeed assumed that this script immediately realized the import, but now I understand it generates another Python Script that does the actual import. Right? Is the generated script only made in memory and executed or is it stored somewhere on the filesystem. In the latter, it could help me to investigate what goes wrong in case I want to create links between instances of different classes when importing them using seperate xml files.

Regarding the xml-xsl reporting we'll have to take some time to learn writing xsl to answer complex questions. Most of my collegues have the knowledge to write complex SQL statements on relational databases very fast, while we have no real specialists regarding xsl-xml querying. In addition we have a little fear xsl-xml reporting might be relative slow in comparison with sql reporting when the number of instances grows to 10.000 ... 100.000 ... 1.000.000 rows or more ... but we'll give it a try.

PS: Kevin and I wrote our own procedures to create and populate SQL tables based on the repository. It seems this was already available ... (http://protege.cim3.net/cgi-bin/wiki.pl ... g_Database) but I haven't checked it yet. Anybody positive/negatieve experience with this utility? Does it create ANSI SQL? It was created for MySql, but we mainly use MSSQL?

Kind regards,
Peter.





Kind regards,
Peter.
User avatar
jonathan.carter
Posts: 1087
Joined: 04 Feb 2009, 15:44

Hi Peter,

Actually, we've found that the XSL can be very efficient - even when processing 30MB XML documents! The main trick is to reduce the 'working' set as quickly as possible. That is, applying templates that select the relevant nodes (instances) in the XML. Although the syntax is not that similar to SQL, I think many of the SQL-style approach to the queries works well to help achieve this.

Let us know how you get on and if you do run into any performance issues please feel free to post about them and we can have a look.

Jonathan
Essential Project Team
Post Reply