Simplicity and the Tree Walker

August 24th, 2008 by Oscar Huseyin

Tree Walking, the art of navigating an object graph or relational model through the use of the Visitor (GoF) or any other arbitrary algorithm. The concept is relatively trivial, in that you start at a node in the interconnected system and then make navigation decisions based on some logic or rules. Most times, the structure of the tree is known a-priori and therefore the navigation is analogous to navigating a map.

Object To Relational mapping, or better known as ORM is one of the application that we can see Tree Walking in practice. For example, given a domain model which defines associations between entities can be instantiated to reveal a complex tree. Here is a typical example:

simplicityAnfTheTreeWalker.gif

In the above example, we can see the domain model where the associations and cardinality of each entity is illustrated. Although the above UML represents classes, this is not an instance view. The instance view is significantly different and more complex to represent, given that there are some unbounded association properties of the model. Therefore, the representation of the instance view can be complex for the given model and is best left to the imagination.

Continuing with our ORM discussion, we can map the above model onto the database tables using metadata and achieve the ORM solution. Up till now, a majority of the implementation details have been inherited from the ORM tool selection, e.g. using annotations or XML to implement the metadata. What’s left now, is the answers to the “hard” questions, typically relating to resource management, performance views, dependency management, transaction management and a whole list of other Application Architecture concerns.

One particular Application Architecture concern that l have seen poorly defined and designed time and time again, is the handling of object graphs that are central to ORM. To further elaborate, object graphs are used all throughout the client and business tiers of an application. They can be constructed from the view and passed to the business tier for persistence, or can be generated from the business tier for consumption (i.e. rendering) by the web tier. The business tier, is where my arch nemesis, the Tree Walker often inhabits. For example, ORM implied constraints like lazy loading, can be a large enough force for some Application Architects to mandate a form of Tree Walking to mitigate N+1 selects problems, specifically to improve application performance. The choice seems obtuse to me, as the complexity associated to Tree Walkers often leads to architectural debt, which is more than not, remediated later in project maintenance cycles; mostly by means of the removal of the Tree Walker.

A recent sighting of a Tree Walker got me gasping as the unnecessary complexity it introduced into the persistence layer. After inspecting the anatomy of the Tree Walker, as with all my other Tree Walker sightings, l was able to dismiss it’s requirement in the solution very easily. This particular application architecture was simple, JSF, Hibernate and Spring all contained in a single WAR deployed in a Servlet container (i.e. no EJB). I was able to dismiss its requirement citing:

  • Hibernate session is in scope of the View and does not need to a shortened connection lifecycle.
  • Lazy collections can be hydrated from the database when the view is walking the object graph in the rendering phase of the request.
  • Why not use already proven and mature Hibernate HQL to fetch the required object graph shape?

The main purpose of the Tree Walker in this instance, was to (a) optimise the database access by ensuring all associations in the object graph are lazy when first loaded into memory from the database and (b) reduce the time connections to the database is kept open; hence avoiding the Hibernate LazyIniitializationException.

Even if the database session was not in the scope of the View (i.e. in the business tier and behind an EJB), point 3 from the above list would still hold strong. This made me think about my previous blog entry on a similar topic; N+1 has leaked into my service interfaces. Does the Tree Walker stop the ORM constraints from leaking in to my service interfaces? Well, no. A developer still needs to define the directions for the Tree Walker which it will use to navigate the object graph. Although the service interfaces may appear to be more simple, the fact that you need to define a parameter to accept the directions for the Tree Walker means you have not avoided service interface pollution.

Time and time again, I often think what drives Application Architects, Designer and Developers to develop a solution that is, simply, not required. Is the art of simplicity something that can be learned? How can one reject thoughts that lead us to develop complex architectures when they are clearly not required? This brings me to a belief that I have held for a very long time; intelligence alone cannot buck the forces of over-engineering; it is wisdom that guides a truly skilled architect to a solution that’s both elegant and simple.

Unit of Scalability

July 27th, 2008 by Oscar Huseyin

Recently, some of our clients had asked us to determine an early performance view of an application that they are either considering to purchase or have recently purchased. Considering all things that make up a J2EE application, this can be difficult to represent. So, how can we present a view of the systems performance to the customer and satisfy all their concerns around product performance?

The answer is relatively trivial. A performance view can be derived using two methods; bottom-up or top-down.

Lets consider the bottom-up approach. This will typically involve an application architecture review, where all major architectural components of the system are identified and then analysed. Analysis of the components typically involve inspecting the configuration of the component in a view to derive the performance sociability of the component. A good example here would be the Hibernate 2nd Level Cache. As developers of Hibernate will testify, the configuration of the 2nd level cache can be very tricky and unforgiving. The cache configuration will typically involve the caching of both static and/or dynamic database data. Static data can be cached without any thought as the data is read-only and will not change during the runtime of the JVM. However, the same cannot be said about dynamic data. Dynamic data requires very careful thought as the developer is playing against transaction management and isolation levels. A simple oversight in design can cause data integrity issues that are hard to find when performance testing.

With view of all the architectural components and the way they configured to interact with each other, a performance architect can derive a scalability view of the system. Depending on the required performance detail, a model can then be formulated which can attempt to stochastically derive system configuration values to select optimal configuration parameters for the system.

Now, lets look at the top-down view. Applying operational analysis, we can define the expected end user usage characteristics and create a load model; also known as an Application Simulation Model. This load model is considered the system input for which the system will be measured for performance. The operational model is the commonly used performance and volume testing method. Applying the load model and measuring system metrics like CPU, memory, network utilisation etc. the performance engineer can view the system scalability on a given platform.

Having defined the two approaches, lets consider the analysis of the data and look at the results that we can obtain from both approaches. To the skillful performance engineer, the data provided from the architectural analysis using the bottom-up approach will give clues and allow conclusions to be derived of the systems performance capabilities. Continuing with the Hibernate 2nd level cache example, a performance engineer could conclude that the requirement to distribute updates to other caches in a cluster of JVM’s will incur an N+1 update overhead. This synchonise operation is, at best, an interprocess call or, worst case, a network call to write the cache state changes to the other JVM’s. Not very good for performance if the system requires a large number of JVM’s. This heuristic view of a component behavior is usually enough to flag a performance issue. The point here is that although the performance engineer is certain of a scalability issue, the results are theoretical and will require proof using operational model methods. Generally speaking, the performance engineer will not be able to derive a concrete scalability result from the bottom-up approach.

Operational analysis, or top-down approach, will provide clearer, more tangible result for the scalability of the system. The results of the test will generally be represented as something like: 100 logged in users, CPU at 80% capacity, memory at 80% capacity and network at 80% capacity. Normalising this view using a well defined, divisible entities is what l’ve found to provide the best view of system scalability.

Let me further clarify what is meant by well defined, divisible entities. To achieve the most potent results, the software and hardware configuration needs to be as simple as possible and representative of the minimum components of the system; for example, a single node for the JVM, a single node for the webserver and a single node for the database. This way, this forms the fundamental unit of configuration. From this point, the application can be horizontally or vertically scaled. Here is a view of this configuration:

unitOfScalability.gif

The diagram illustrates a simple system configuration and describes the method which virtual users access the system.

By ensuring that the system remains simple and each software and hardware components are configured to the minimum required level to access all possible system functions, we can run our Application Simulation Model load into the system and measure the key system metrics. These measured metrics grouped together and presented as a whole is what l call the Unit of Scalability. Once this fundamental view is acquired, other scalability attributes are very easily derivable. For example, the Unit of Scalability can be used to derive whether the system scales vertically or horizontally in a linear manner. To determine the horizontal scalability attributes, simply doubling the hardware and software components and executing the same simulation model will reveal the systems linear scalability capabilities.

Having calculated the Unit of Scalability and further derived the systems vertical and/or horizontal scalability attributes, we can accurately quote system capacity requirements, and more importantly, present our customers with a simple view of a systems performance and scalability.

Performance Unit Test; a development concern

March 17th, 2008 by Oscar Huseyin

As a developer, I feel performance is too often neglected by the development team. I’ve been on a few projects that have been really compromised by the performance aspects of a JEE system, specifically because of the lack of performance testing during the development cycles of the SDLC. Often, as performance issues are identified during performance testing (albeit transactions not meeting Service Level Agreements, bloated memory profiles, transaction timeouts etc.) the development team will be summoned to identify the performance related issues, mostly at the end of the testing cycles and very late in the SDLC.

So, what do the development team do to identify performance issues in a system? They hypothisise, they reason, they eliminate suspects from potential problem candidates, then finally settle on bringing out a profiler of choice. Having suspected areas of concern, they target the profiler into the functional area an begin capturing runtime behavior.

Each captured profile is analysed by the developer and performance relating defects are identified and corresponding defects raised. These defect are then fixed and released through the testing cycles. This is the classic and well known process of finding performance related defects during performance testing, raise the testing defect, fix the defect in the development cycle, then finally release back to testing. Can we see parallels to our novel approaches of the past? Very waterfall indeed.

In the past, we had looked for innovative ways to put a stop to this type of waterfall development by attempting to identify and rectify as many defects during development as possible. We have developed sophisticated unit testing, integration testing, application automation scripting systems in a view to sufficiently unit test our code and pick up defects early in development; however, we have neglected performance aspects of our code.

And it still remains neglected.

A fellow odecee colleague and I decided to have a think about his problem and devise a new standard for our development teams. And on that one day, the Performance Unit Test was born.

We begin to describe the Performance Unit Test as a developer deliverable which forms a part of the overall Unit Testing process. It’s function is to prove the applications performance concerns are met. Each Performance Unit Test must be repeatable and (if possible) automated for simple execution. Results must be analyzed by the unit tester, fixing any identified performance problems. Results of the Performance Unit Test form part of the Code Reviews and are verified by the reviewer. The developer must step through the Performance Unit Test results with the reviewer before the components are formally signed off.

Employing the Performance Unit Test in the development process can seem expensive at first, given that it extends the development times of a particular component, however, as with an effective Unit Testing implementation, has the capacity to greatly reduce risk of performance related issues late in the SDLC and will ensure your application code delivered into performance testing is highly performant and optimised.

When to avoid the Container during Unit Testing

January 30th, 2008 by Oscar Huseyin

Over the course of my career, I have been eagerly reading, interviewing colleagues and experimenting with my own unit testing methodologies. It was only recently, that a fellow odecee colleague and I sat in a room and attempted to map out what we thought was a practical and repeatable unit testing methodology that we could take to other projects and cement a position on a unit testing pattern for odecee.

After brainstorming our ideas and categorising all the types of development testing, we were able to define a taxonomy that represented all types of tests that a development team should be creating. After some further refinement, I described the process in an odecee white paper with view to lay the foundations for odecee’s unit testing methodology.

Central to our discussions was the topic of this blog; when should a developer avoid testing in the container? To better illustrate the question, it’s best if I define what container avoidance is. First and foremost, all JEE applications will have a target deployment environment which typically includes an installation of some JEE container (WebSphere Application Server, Tomcat etc.) where the application that is being developed will be deployed to. Now, all containers provide foundation services to which the application can call upon to access some resource; albeit a database, LDAP directory, Connector, JMS provider, even the humble HTTPRequest object. The application under development will typically have a dependency on one or more resources managed by the container and will therefore need the services during the execution of the application. If you want to run your unit tests on an application that has a target container which is managing resources, then testing your application outside your container will require a strategy to provide replacements for your container managed services. The process in which a solution is derived that substitutes a container service for a fake one, albeit a stub or mock, is what I call container avoidance.

One implementation of a container avoidance strategy is to use an off-the-shelf framework like Spring, where each class in your application is configured as an “bean” and defined to have dependencies to other “beans”. Loosely coupling your java classes in this manner puts you in an ideal situation to abstract all your (potentially) container service providers. Now, swapping out your providers should be a simple configuration task; a task that can be abstracted by your unit testing framework, like the AbstractTransactionalDataSourceSpringContextTests. Seemingly simple really, but does require some planning and commitment to the testing pattern.

Once all your providers are abstracted, we have achieved container avoidance and can now run our primary test objects outside the container and weave our test code.

So, now that we have a method to avoid most container services, we’ve managed to reach further into our classes using our unit test code than ever before. Has this process increased the likelihood of a “works-first-time” deploy? No, it certainly has not. In the view to implement container avoidance, we’ve missed testing other aspects of the system under test, like Deployment Descriptors, System Configurations, View Concerns and many more things that make up the working application.

Although we have definitely achieved a engineering feat, l would characterise container avoidance as an academic task; one which can easily miss the pragmatic viewpoint of unit testing: proving the business function.

Well, where too from here? How far should we go with abstracting our providers? Only datasources? Only EJB’s? What about Web Services? It’s a tough question. And the answer depends on your philosophy of unit testing. Complex systems have many moving parts, orchastrated in a perfect harmony that will reliably deliver business value time and time again. Therefore, we should try our hardest to test the business functions and not the finest code granularity in a view to attain high levels of code coverage. This means using the container services by deploying your application and testing all facets of the system, including container configuration, in the test cycle. Lets not forget, if it was not for the services of the container, we would be forced to write multi-threaded applications on every customer engagement. Now that would surely be another engineering feat.

Is Dependency Injection only useful for Unit Testing?

January 7th, 2008 by Oscar Huseyin

The rise in popularity of Dependency Injection over the last three years has been an interesting observation. “Lightweight containers” like Spring and Pico have really intrigued and excited the masses of Java Developers, so much so, that Spring (for example) is automatically adopted and accepted to be mandatory for almost all J2EE applications.

Looking back to when I was first reading about Spring, I clearly recall how the DI concepts and simple semantics of the XML configurations really got my creative juices flowing. I remember thinking about all the possibilities the framework could offer; specifically the view of configuring all my components so they had no knowledge of their dependents. Just after I created my “Hello World!” application, I quickly realized that the concepts of the Abstract Factory had just been reinvented, this time under the “Lightweight Container” marketing. Sure, the framework provides externalised definitions of object dependencies and promotes coding to interfaces, but its no different than my trusted old Abstract Factory pattern. Ignoring my cautious voice, the shiny marketing got the better of me and I convinced myself Spring could present a better alternative to my novel approaches; I’d decided to use it on my next project.

After about three months of using Spring in angst, I recall moments where I was enjoying the flexibility I’d attributed to Spring. Spring promoted the definition of loosely coupled Java classes which really gave me an advantage during unit testing. Heck, with a bit of thought, I could wire every one of my java classes using Spring bean definitions and define all my dependents as a reference to another Spring bean. Now, I can pass in Mocks, Stubs, basically anything that implemented the contract. What power! It gave me incredible leverage to test my primary test object by mocking or stubbing my fixtures or dependents; now, obtaining higher levels of code coverage was really easy. I just had to ensure that all the java classes that I needed to test were defined as a Spring bean, making sure that all the primary test objects dependents were also defined as bean references. This way, by overriding specific bean definitions with Mock or Stub ones, I could replace them at runtime to exercise every concern of my application code.

One observation that I made during this time was the reduced number of the new operator usage. Looking though my source, everything was injected into my classes and the new operator had practically vanished. I recall the infamous “Dont call me, we’ll call you” cliche had whispered in my mind.

Looking back over my first Spring project, I’d say we achieved a test code coverage of about 80%. This was really exciting and I’d give all the credit to our decision to use Spring. A few years later, a few more gray hairs and a few more Spring enabled projects, I’ve realised that Spring is only an enabler to achieve high unit test code coverage. Sure, Spring privides some really nice JDBC, JNDI, JMS etc helpers, but the XML configurations aspect that it gives to an application can get really messy; so much so that it can become a real development impedance. How is this possible? Read my Aspects of Spring on a Monolithic Codebase blog; it sums up my growing pains.

So, the one resounding question that I have in my mind is “What real benefit does Spring offer to a system?”. After many discussions and hours of mental deliberation, I categorise the benefits of Spring by percentage as 90% unit testing flexibility and 10% utility.

Where to from here? Well, that depends on the importance you place on code coverage. Period. If your aiming for high levels of code coverage, then Spring’s your answer; just expect to deal with a pervasive XML configuration aspect of your codebase.

Proposing alternatives to Spring could be futile given the wild acceptance of the framework. In my humble view, Abstract Factories are formidable alternatives and can also be configuration driven and, again, cause a proliferation of dependency descriptor externalisation, however, they force you to think about your providers and give you a break from the Spring alternative to think about when you need to abstract your providers. This way, your decisions to define your configurable components can be made with more pragmatism and care.

In conclusion, using Spring has many advantages, most of which are, sadly, concerned with the testability requirements of your codebase. Separating your application tiers by demarcating with a layer of Spring configuration will be my next approach. This way, I can achieve my provider view using the attractive semantics of Spring bean definitions.

Unit testing; how far do you push the envelope?

December 4th, 2007 by Oscar Huseyin

Over the years, l’ve read lots of commentary, white papers, best practice papers, books on the topic of TDD. I’ve heard the rant of many TDD evangelists who preach about how total code coverage brings you closer to code quality perfection and how you’ve failed when you’ve not been able to achieve these goals. Sure, this is an extreme example of evangelical preaching, where in actual fact, most of these individuals commonly drum down their hard line views of testing by using words like “pragmatism” and statements like “do what works best”. But, why do l feel as if I’ve failed if l have not got 100% code coverage? It’s because l, to some degree, shared some of the religious views about testing.

I’m now at a point where I’m beginning to rethink some of my beliefs about testing after many years in the trenches. So, l’m at the crossroads settling on a methodology that, l feel, works the best. What level of unit testing is really required to meet the business needs?

I’ll start by analysing two “special interest” projects to see what the outcomes they delivered based on the business expectations.

Firstly, let me talk about a project that was one extreme; no mandated position on unit testing. The project was highly successful, where the business expectations were met and exceeded. Donning my evangelist hat, I’d say the project outcomes were a fluke and it was a miracle that we were able to make any changes to the application without having a negative impact on functionality. Looking back, the project was definitely not a fluke; we made lots of changes to the application without any regressive impact. We knew our issues and had the right processes in place to gate-check the application functionality pre-release. For example, a week before each release, every developer had an area of expertise in the application which they would spend approximately a week testing the functional area and making any spot fixes as need be. We were not very clever about our testing methodology, but we delivered on time, on budget and exceeded customer expectations.

Now, let me tell a story of another, very different project, one that’s in stark contrast to the first one. This application had literally 98% code coverage. Unit tests, integration tests, front end screen tests, water tight code reviews, continuous integration, nightly deploys, every agile practice and quality assurance process under the sun. Did the code meet the business expectations? Well, yes; but it was expensive. It took twice as long to develop an application feature, and we mandated near perfect code coverage. Was this approach more successful than my first example? Not really. Sure, we had more confidence in making changes to the code base and having an “immediate view” of regression impact of the change. But the business paid a price for all of that. A very heavy price. One would think, given the money it cost for development to test the application, that the number of defects would be significantly reduced; but they weren’t. We had lots of functional and non-functional defects detected by testing which was effectively misinterpretation of  business requirements or some gap in the business logic.

Which, from a business perspective, was more successful? Both. The corollary is that a heavily tested application cost lots of money and takes longer to build. This l have seen first hand. So, time to answer the titan question from my own experiences.

As a developer, you need to test the components that you write; theres no arguing that. Otherwise, how else can you prove the functionality of your components? Bu, just how far should we push the envelope?

My view is simple. We all need to be pragmatic about how we approach our unit testing. We should always stop and ask ourselves “are we going to far with our unit testing?”. As a developer, we are faced with this question constantly. We should always do the most to prove our components are functionally correct, but also write the least amount of unit test code to ensure our testing solution remains simple yet effective.  After all, a good developer is a lazy one.

Optimise your Web Tier using Sematic Markup

October 27th, 2007 by Oscar Huseyin

Asking anyone about the importance of getting your applications business tier design and implementation right will draw no objections. A well designed and implemented business tier increases the flexibility and maintainability of your application and shields it from complex and difficult refactoring imposed by changing requirements.

Over the years, l’ve seen well designed and implemented business tiers by the use common design patterns such as Command, Data Access Objects, Service Facade etc. If well implemented, these design patterns can increase the maintainability of the application and increase the business agility to evolving business requirements. From my experience, the same cannot be said about the View.

Ive seen a number of projects fail dismally in the view implementation where changes are typically complex and difficult to implement. Frameworks like Spring MVC and Struts are great to help organise application layers and apply constraints to help avoid abstraction leakage (like business logic bleed into the view) and provide a clean MVC framework, but is still not sufficient for Web Tier agility. The technology ecosystem for the Web Tier is vast with new and constantly evolving frameworks which increase the richness of the web experience. AJAX, JSP, Velocity, Tapestry, Java Server Faces are all excellent tools in implementing your view and all present a number of rich API’s to help in the rendering and behavior of the view.

That said, what I’ve found missing in Web Tier implementations is a consistent lack of organisation. So what do l mean by “organisation”? Using the business tier as an example, typically most database access detail are hidden behind a Data Access Object API. This way, as long as business services are coupled to the DAO API’s, then changes to the data abstraction code is localised in one area; the implementation of the Data Access Objects. This is the classic Separation Of Concerns. In my experience, concerns of the View have not always been adequately separated.

Most projects that l’ve worked on have not able to even articulate the View concerns. So, what are the concerns of the View? Well, simple. Markup, Style and Behavior. Separating these layers in the view is crucial to increase flexibility and maintainability of your view. Separating the Markup, Style and Behaviour of an application requires diligence and strict adherence to a few rules:

(1) - Ensure that your markup is limited to structural definitions and conforms to XHTML (Strict, Transactional or Frameset). The rule is to exclude any style or behaviour code. Here’s is an example of bad markup:

<html>
    <body style="background: #fff; alignment: left;" onclick="doSomeWork()">
        <ul style="list-style-type: none">
	    <li>A List Item</li>
        </ul>
    </body>
</html>

The above example mixes all of the View concerns. Firstly, there is style definitions in the <body> tag where the background is set to some arbitrary value and the alignment is set to left. Some style information has now leaked into the markup and will force the developer to search for and change each instance of the <body> when the application is being “re-skinned”. This same problem is repeated in the unsorted list definition, where the style of the list has been hard coded into to markup.

Secondly, the <body> tag has been defined to handle the onclick event from the document body. Similar to the first problem, refactoring the behaviour of the <body> will force the developer modify every <body> in the application when there is a change required in this area.

(2) - Ensure that Style is separated from the markup. This enables your applications view to be changed with minimal effort. When marking up the documents block elements, ensure that you use id and class attributes to decouple the markup from the style. This way, with the use of CSS selectors, style can be added to the markup and can be changed across the whole site using a single definition.

(3) - Ensure that Behaviour is separated from the markup. Once implemented, there should be only one JavaScript function call in your markup, which is typically <body onload="onload()"/>. Given that rich web view implementations require JavaScript, avoiding it is not practical, therefore attachment of events should be defined in a JavaScript file and called from the onload() call path.

/**
 * The onload() method that separates the markup from the behaviour
 */
function onload() {
    attachEvents();
}

/**
 * Attatchement of all events in the document
 */
function attachEvents() {
    document.getElementById("foo").attachEvent("onclick", bar);
    // Attatch all the events for document
}

/**
 * Event handlers.
 */
function bar() {
    doSomethingForOnClickEvent();
}

There you have it. Clean separation of all your View concerns. If you ensure that Markup, Style and Behaviour have been adequately separated, then you will have a more agile View, where skinning the application can be performed in far less time.

N+1 has leaked into my service interfaces

October 11th, 2007 by Oscar Huseyin

About two years ago, l (like many others) bought and read Hibernate In Action. It was definitely the most decisive book on the Object to Relational Mapping tool that l could find. Apart from being really well written, the book was a great reference text which could be used in the trenches to configure and use Hibernate.

Now, a few years after using Hibernate in angst, l have realised the decision to use ORM has large drawbacks; larger than l initially envisioned. Notably, the infamous N+1 selects problem can cause increased development times as developers spend a large portion of time “ironing out” the performance issues related to the ORM implied constraints.

In first appearances, the N+1 problem presented performance issues that would often cripple the JVM for memory whist fetching and hydrating objects from the database. The solution seems trivial in that defining lazy associations would restrict the loading of object until they were needed, consequently restricting the number of objects in the object graph. However, this is not the be-all-and-end-all of problems relating to N+1.

Looking back, l had a sense of victory after we performed a first pass to detail the domain associations with a view to implement a more performant database abstraction. However, as the system functionality increased, the requirements on the domain model unfolded to create more and more “scenario based” object associations. To illustrate this point, let me give a Hello World! example. If my domain object Customer has associations to Order, Item and Address then my association could be represented as:

Customer  (1) ---------- (*) Order  (1) --------- (*) Item
                |
                --------------(*) Address

Now, if l don’t define any lazy associations, then when l load my Customer from the database, Hibernate will retrieve all Order’s, Item’s and Addresses. Adding lazy associations to each relationship, l can now control the loading of Order’s, Item’s and Addresses as l need them. Typically, this is achieved by “touching” the Collection that l need to load from the Customer. Simple right?

In my example, one possible “scenario” is the non-lazy one; e.g. when you load a Customer, all associated objects are also loaded. Another scenario is the lazy one, Customer’s with Order’s and no Addresses. I’ve described two scenarios here. Can you see anymore? I can. Customer with Orders only (e.g. no Item’s). Any more? Yep, Customer with Addresses only. l can go on and on. So, the number of possible object loading scenarios is a function of the number of associations in the domain model. Now, that can be a very big number! Exponential actually.

Given this aspect of ORM must be solved to increase the performance and scalability of the domain, how is this typically implemented? That’s the focus of this blog; N+1 leaking into the service methods.

To continue with the above example, if l create a service to retrieve Customer’s, l could (without considering the lazy associations) create a service named CusomterManager with a single method named getCustomer(int customerId). However, the service interface is certain to be non-performant as my Customer will be loaded with Order’s, Item’s and Addresses. Now, if l want to specialise my object graph that is returned, l need to add more methods to the CustomerManager service, getCustomerWithOrders(int customerId), getCustomerWithOrdersAndItems(int customerId), getCustomerWithOrdersItemsAndAddresses(int customerId), and so forth.

So, from a single method in my CustomerManager service to four! Thats what l call an abstraction leakage. My clients are now exposed to the shortcomings of ORM and l have severely polluted my service interface.

Avoiding this service pollution is not a concern of development. This responsibility rests squarely with the application architect. Constraints applied by application architecture are the primary cause of the abstraction leakage which were mandated by the use of, say EJB. Retrieving object graphs from Stateless Session Beans will directly present the ORM shortcomings for clients to deal with. However, services deployed in, say, the Servlet container will remove the need to pollute service interfaces, but create other issues like holding onto resources (such as a database connection) for lengthy periods of time to allow the service implementations to “retrieve” the lazily loaded associations as needed.

In conclusion, living with Hibernate is costly. The semantic definitions of your interfaces will resemble the object graphs that are being fetched and returned. Ive found this to be really messy and will force unwanted constrains on otherwise simple service definitions.

Dynamic LDAP group membership; increasing the flexibility of your security model

October 3rd, 2007 by Oscar Huseyin

How’s this for a problem; having users that can switch between security context in a J2EE construct and maintain a linear number of user to group associations in LDAP. What does this mean? To put it simply, if l’m user Foo and l can (from a security perspective) take on the exclusive roles of an administrator, accountant, auditor respectively, this must ultimately equate to group memberships; more specifically LDAP group memberships. This problem is not solved by simply assigning the user Foo into all the groups, because then Foo would have access to an auditor’s functions after he logged in as an accountant, and not all accountant’s are auditors. If we were to attempt a solution modeling the permissions using LDAP, we would have to create users Foo, Foo' and Foo'' where users are given group memberships to administrator, accountant and auditor respectively. From an LDAP perspective this solution is unmanageable, because as the requirement for users permissioning levels change, then the number of unique user to group combinations would exponentially increase, resulting in unmanageable LDAP system.

A simple solution to this problem is in the use of dynamic groups. One small problem though, a dynamic group is only a concept and, technically speaking, cannot be modeled in OpenLDAP, eTrust and IBM Directory Server. Therefore, dynamic groups functionality must be built into the application security architecture.

Firstly, all the defined groups must be defined a-priori and stored in an LDAP server, eg. cn=mycompany-group-banking, ou=groups,ou=policies,o=mycompany. All users in the system are not assigned any group memberships in LDAP itself which means, form an LDAP perspective, they dont have any group memberships. Now, to achieve dynamic groups, the “system access profile” needs to be read from a store (which is typically the application database as the user to profile relationships are highly normalised) and then used to build the users security credential. For example, the database is queried to read the user group memberships (which map one-to-one with the LDAP group) and are then used to construct the platform specific security credential; the JAAS Subject in a J2EE construct.

Confused? Perhaps it time for some concrete examples. WebSphere Application Server has an interface which can be configured to allow credential construction from a “trusted” source. This mechanism is called the Trust Association Interceptor (TAI). You could also use a custom Login Module but l find them a little cumbersome to work with. IBM Developerworks has a great article on this exact topic. If you want to assert the identify of a user who has not been authenticated using WebSphere, you can configure the TAI so that it fires when a client attempts to access a secured resource. At this time, the TAI has the ability to construct a security credential and assert the identity to WAS.

/*
 * Trust Authentication Interceptor interface method that will called by WebSphere.
 */
public TAIResult negotiateValidateandEstablishTrust(
                            HttpServletRequest req,
                            HttpServletResponse resp) throws WebTrustAssociationFailedException {
    String userid = req.getParameter("userid");

    String password = req.getParameter("password");

    try {
        // Compute identity information
        InitialContext ctx = new InitialContext();

        UserRegistry reg = (UserRegistry) ctx.lookup("UserRegistry");

        // Check the users identity in LDAP, eg. course grained authentication
        boolean validUser = authenticateUsingLDAP(userid, password, reg);

        // Load users group memberships from the database

        if (validUser) {
            Collection groups = readGroupsFromDatabase(userid);
        } else {
            throw new UnauthenticatedException("User could not be authenticated.");
        }

        // Having read the groups from the database for the authenticated user
        Subject subject = constructSubject(groups);

        // Now, assert the identity to WebSphere Application Server
        return TAIResult.create(HttpServletResponse.SC_OK, "notused", subject);

    } catch (SomeException e) {
        Log.ingo("Exception caught whilst attempting to authenticate user");
    }
}

Now that we have a security credential, for all invocations to secured EJB’s, Servlet’s, JSP’s, etc. WebSphere will automatically propagate the Subject from the authenticated caller to the callee. For example, when an EJB is invoked in another WAS instance in the Cell, the WebSphere EJB container (on the callee side) will intercept the EJB call, check the presence of the LTPA token for the call, check the local WAS Security Store for the (potentially) cached Subject, then make a call to retrieve the Subject (via JMX) if the Subject is not present in the Security Store to the caller WebSphere.

Constructing the subject is also simple. The key to the design, as mentioned before, is to ensure your user group memberships are in the application domain and not in LDAP. Here’s an example:

private Subject constructSubject(String userId, Collection groups) {
    // Create the JAAS Subject
    Subject subject = new Subject();

    // Data structure to store credentials
    Hashtable hashtable = new Hashtable();
    hashtable.put(AttributeNameConstants.WSCREDENTIAL_UNIQUEID, userId + System.currentTimeInMillis());
    hashtable.put(AttributeNameConstants.WSCREDENTIAL_SECURITYNAME, userId);

    // This is the point which implements the Dynamic Groups.  WebSphere reads the
    // group memberships using the WSCREDENTIAL_GROUPS key.
    hashtable.put(AttributeNameConstants.WSCREDENTIAL_GROUPS, groups);

    // Add all credential information to the Subject to complete Subject construction
    subject.getPublicCredentials().add(hashtable);

    return subject;
}

One point to be careful about. The groups that you insert into the public credential set of the Subject should exist and match the ones in LDAP. An example of a group is:


cn=mycompany-group-accounting, ou=groups,ou=policies,o=mycompany

Binary Dependencies? But, my domain objects are POJO’s?

September 24th, 2007 by Oscar Huseyin

Recently, l encountered a very interesting fact about using Hibernate and EJB. Our design was simple, Container Managed Transactions and a remote interface provided by EJB, Spring wired business objects, and Hibernate as our data access layer. Simple and neat. Well, sort of.

It had been smooth sailing, we delivered a large number of application functionality and time came to integrate a third party application. We provided a specific integration API (through EJB) to abstract the complexity of our business processes. Our API was clean and simple to use. We spent a while “hooking up” the third party application and just as we we thought we were going to pull it off; Unmarshalling Exeptions. What the?

We looked for the usual candidates; had we packaged the incorrect version of our domain objects that we delivered to the vendor? Nope. Did we have an incorrect version of the domain objects? Nope. Well, what could it be? Looking a little closer at the exception stack trace revealed the truth; incompatible hibernate proxy objects! Oh no, hibernate is in my POJO.

A quick check of the vendor libraries revealed a truth we were hoping would not be true; the vendor was using hibernate as well, and no prizes for who can guess what the problem was; The vendor had a different version of hibernate.

This made me reflect on the “discussions” we’d had with Architecture about the use of Data Transfer Objects (DTO’s). An Object Oriented purist would strongly argue that DTO’s are, in fact, not objects as they have no behavior, and that DTO’s increase maintenance costs due to the impact they have on change. But, if your architecture has mandated isolated classloaders (EJB’s facades), one must pass “pure” objects from one classloader to the next. What does that mean? Its simple, the binary dependency should only be on the objects that are defined in the API and nothing more.

So, my next question was; where the hell is hibernate hiding in my domain objects? Collections. To be more specific; hibernate Collections. Due to the many pitfalls of ORM, hibernate has “solved” the problem by creating proxies for Collections of objects. This is the hibernate teams solution to the N+1 selects problem. Only for me, this particular aspect relating to ORM had leaked into our architecture and exposed a hole that was not covered. Now we have a submarine with a leaky hull.