<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	>
<channel>
	<title>Comments for odecee blogs</title>
	<atom:link href="http://www.odecee.com.au/blogs/?feed=comments-rss2" rel="self" type="application/rss+xml" />
	<link>http://www.odecee.com.au/blogs</link>
	<description>Sharing our thoughts</description>
	<pubDate>Tue, 07 Sep 2010 18:07:49 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.6.1</generator>
		<item>
		<title>Comment on Is Dependency Injection only useful for Unit Testing? by Bill Bartmann</title>
		<link>http://www.odecee.com.au/blogs/?p=19#comment-2337</link>
		<dc:creator>Bill Bartmann</dc:creator>
		<pubDate>Thu, 03 Sep 2009 04:33:41 +0000</pubDate>
		<guid isPermaLink="false">http://apollo.odecee.com.au/blogs/?p=19#comment-2337</guid>
		<description>Cool site, love the info.</description>
		<content:encoded><![CDATA[<p>Cool site, love the info.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on When Prepared Statements Go Bad by Brett Henderson</title>
		<link>http://www.odecee.com.au/blogs/?p=134#comment-2324</link>
		<dc:creator>Brett Henderson</dc:creator>
		<pubDate>Mon, 31 Aug 2009 21:57:20 +0000</pubDate>
		<guid isPermaLink="false">http://www.odecee.com.au/blogs/?p=134#comment-2324</guid>
		<description>Thanks for the comments, it's interesting to hear how others have dealt with the same problem.  The other thing I've noticed is how the solutions tend to be different for different database engines.  My background has been primarily with Oracle so PostgreSQL has been a new experience for me.

I've experimented more since writing the initial blog in an attempt to find a better solution.  PostgreSQL does have a "prepareThreshold" parameter on connections and individual statements that provides control over whether named or unnamed statements are created on the server.  Named statements create a query plan at the time of creation whereas unnamed statements create a query plan with knowledge of the bind variables.  This functionality would normally solve the problem.  Unfortunately in my case I'm also using a result set batch size of 10000 which forces the use of named statements.  I have to set the batch size in order to keep memory usage at a minimum, Osmosis is designed to be "streamy" and can handle arbitrary data set sizes.

The other avenue I've pursued is using temporary tables.  I first create a temporary table containing the id and version of the records I'm interested in (ie. the sub-select portion of the query).  I then ANALYZE the table to update its query planner statistics and then select from the nodes table joining to this temporary table instead of using a sub-select.  This means the final query doesn't use bind parameters at all, and it also allows me to re-use the temporary table in subsequent queries.  The initial temp table creation query uses bind variables but doesn't return a result set which means that unnamed server statements are used and bind variables are taken into account.  The temp table creation is working well, but for some reason the retrieval query (joining to the temp table) still isn't producing good query plans in some cases although I'm going to get the DBA to double check my results.

For now I'm using the temporary table approach which avoids bind variable issues but I'm still forcing the query planner to avoid sequential scans, hash joins and merge joins.  This appears to be performing very well, but I'll continue to work on it with the DBA to see if we can get the database to produce sane query plans on its own.</description>
		<content:encoded><![CDATA[<p>Thanks for the comments, it&#8217;s interesting to hear how others have dealt with the same problem.  The other thing I&#8217;ve noticed is how the solutions tend to be different for different database engines.  My background has been primarily with Oracle so PostgreSQL has been a new experience for me.</p>
<p>I&#8217;ve experimented more since writing the initial blog in an attempt to find a better solution.  PostgreSQL does have a &#8220;prepareThreshold&#8221; parameter on connections and individual statements that provides control over whether named or unnamed statements are created on the server.  Named statements create a query plan at the time of creation whereas unnamed statements create a query plan with knowledge of the bind variables.  This functionality would normally solve the problem.  Unfortunately in my case I&#8217;m also using a result set batch size of 10000 which forces the use of named statements.  I have to set the batch size in order to keep memory usage at a minimum, Osmosis is designed to be &#8220;streamy&#8221; and can handle arbitrary data set sizes.</p>
<p>The other avenue I&#8217;ve pursued is using temporary tables.  I first create a temporary table containing the id and version of the records I&#8217;m interested in (ie. the sub-select portion of the query).  I then ANALYZE the table to update its query planner statistics and then select from the nodes table joining to this temporary table instead of using a sub-select.  This means the final query doesn&#8217;t use bind parameters at all, and it also allows me to re-use the temporary table in subsequent queries.  The initial temp table creation query uses bind variables but doesn&#8217;t return a result set which means that unnamed server statements are used and bind variables are taken into account.  The temp table creation is working well, but for some reason the retrieval query (joining to the temp table) still isn&#8217;t producing good query plans in some cases although I&#8217;m going to get the DBA to double check my results.</p>
<p>For now I&#8217;m using the temporary table approach which avoids bind variable issues but I&#8217;m still forcing the query planner to avoid sequential scans, hash joins and merge joins.  This appears to be performing very well, but I&#8217;ll continue to work on it with the DBA to see if we can get the database to produce sane query plans on its own.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on When Prepared Statements Go Bad by Ashitkin Alexander</title>
		<link>http://www.odecee.com.au/blogs/?p=134#comment-2311</link>
		<dc:creator>Ashitkin Alexander</dc:creator>
		<pubDate>Sun, 30 Aug 2009 10:48:14 +0000</pubDate>
		<guid isPermaLink="false">http://www.odecee.com.au/blogs/?p=134#comment-2311</guid>
		<description>It's not the problem of java, jdbc and perpared statements. it's a problem of database imho how to handle binding variables and how to cost cardinality and selectivity.</description>
		<content:encoded><![CDATA[<p>It&#8217;s not the problem of java, jdbc and perpared statements. it&#8217;s a problem of database imho how to handle binding variables and how to cost cardinality and selectivity.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on When Prepared Statements Go Bad by Adam Malter</title>
		<link>http://www.odecee.com.au/blogs/?p=134#comment-2308</link>
		<dc:creator>Adam Malter</dc:creator>
		<pubDate>Sat, 29 Aug 2009 15:22:18 +0000</pubDate>
		<guid isPermaLink="false">http://www.odecee.com.au/blogs/?p=134#comment-2308</guid>
		<description>It helps to think of this more generally. When you run the query in your SQL editor (or otherwise as a full statement), you have resolved all the parameters. Therefore, the database can look into it's statistics and base the query plan on what exact data you are seeking.

When you tell the database to do a prepared statement, you specifically are instructing it to create the best possible query path for *any* possible value that could be bound into the parameters. For tables and indexes with highly skewed statistics, this often means that the average case query plan is non-performant with the 'outlier' case. (Which doesn't seem like an outlier to you, because you're providing the values right then and there)

Your solution analysis is 100% correct, although, I would warn against tuning down the overall ability of the database to optimize prepared statements. That might help this case, but the optimizer in modern databases has been tuned well for the general case by many years of hard lessons. Instead, I'd look at your solution 2 - or, do that same type of work in code. It's probably only a single parameter that's causing the skew. Instead of binding it using a setParameter(), use a Velocity template or such to 'SQL Inject' it in (being very careful if it's user supplied data).

This puts extra pressure on the number of prepared statements the database has to keep track of, but, if you tune it for only those skewed values, you can get the results you want, while still enjoying the benefits of prepared statements for the rest of your parameters. 

As an aside, if you use a java based SQL tool like Aqua Data Studio, you'll often be able to run the explain query with the parameters as '?' - This should have given you the nasty query plan you eventually saw. Our team has learned the hard way to always run an explain in that mode against production statistics. It is the only way to truly see what's going to happen on the database.</description>
		<content:encoded><![CDATA[<p>It helps to think of this more generally. When you run the query in your SQL editor (or otherwise as a full statement), you have resolved all the parameters. Therefore, the database can look into it&#8217;s statistics and base the query plan on what exact data you are seeking.</p>
<p>When you tell the database to do a prepared statement, you specifically are instructing it to create the best possible query path for *any* possible value that could be bound into the parameters. For tables and indexes with highly skewed statistics, this often means that the average case query plan is non-performant with the &#8216;outlier&#8217; case. (Which doesn&#8217;t seem like an outlier to you, because you&#8217;re providing the values right then and there)</p>
<p>Your solution analysis is 100% correct, although, I would warn against tuning down the overall ability of the database to optimize prepared statements. That might help this case, but the optimizer in modern databases has been tuned well for the general case by many years of hard lessons. Instead, I&#8217;d look at your solution 2 - or, do that same type of work in code. It&#8217;s probably only a single parameter that&#8217;s causing the skew. Instead of binding it using a setParameter(), use a Velocity template or such to &#8216;SQL Inject&#8217; it in (being very careful if it&#8217;s user supplied data).</p>
<p>This puts extra pressure on the number of prepared statements the database has to keep track of, but, if you tune it for only those skewed values, you can get the results you want, while still enjoying the benefits of prepared statements for the rest of your parameters. </p>
<p>As an aside, if you use a java based SQL tool like Aqua Data Studio, you&#8217;ll often be able to run the explain query with the parameters as &#8216;?&#8217; - This should have given you the nasty query plan you eventually saw. Our team has learned the hard way to always run an explain in that mode against production statistics. It is the only way to truly see what&#8217;s going to happen on the database.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on When Prepared Statements Go Bad by Doug</title>
		<link>http://www.odecee.com.au/blogs/?p=134#comment-2306</link>
		<dc:creator>Doug</dc:creator>
		<pubDate>Fri, 28 Aug 2009 21:26:01 +0000</pubDate>
		<guid isPermaLink="false">http://www.odecee.com.au/blogs/?p=134#comment-2306</guid>
		<description>Great post.
I have run into this kind of issue in production with Oracle. But the background and symptom are almost the same: run fine in sqlplus but slow as hell in Java code.
What happen is the Oracle optimizer picks the query plan based on the bind values of the first query. If the first query turns out to use "bad" variables (those with low histogram) subsequent queries will be slow. This in fact is a known issue in Oracle world called Bind Picking. You can find more information here http://www.pythian.com/news/867/stabilize-oracle-10gs-bind-peeking-behaviour-by-cutting-histograms

Our solution was we cut the histogram on those troublesome columns and stop using "auto sample size" when analyzing our schema.
Cheers</description>
		<content:encoded><![CDATA[<p>Great post.<br />
I have run into this kind of issue in production with Oracle. But the background and symptom are almost the same: run fine in sqlplus but slow as hell in Java code.<br />
What happen is the Oracle optimizer picks the query plan based on the bind values of the first query. If the first query turns out to use &#8220;bad&#8221; variables (those with low histogram) subsequent queries will be slow. This in fact is a known issue in Oracle world called Bind Picking. You can find more information here <a href="http://www.pythian.com/news/867/stabilize-oracle-10gs-bind-peeking-behaviour-by-cutting-histograms" rel="nofollow">http://www.pythian.com/news/867/stabilize-oracle-10gs-bind-peeking-behaviour-by-cutting-histograms</a></p>
<p>Our solution was we cut the histogram on those troublesome columns and stop using &#8220;auto sample size&#8221; when analyzing our schema.<br />
Cheers</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Binary Dependencies? But, my domain objects are POJO&#8217;s? by zqudlyba</title>
		<link>http://www.odecee.com.au/blogs/?p=14#comment-1806</link>
		<dc:creator>zqudlyba</dc:creator>
		<pubDate>Wed, 13 May 2009 07:02:12 +0000</pubDate>
		<guid isPermaLink="false">http://apollo.odecee.com.au/blogs/?p=14#comment-1806</guid>
		<description>See Remote Lazy Loading in Hibernate

http://www.theserverside.com/tt/articles/article.tss?l=RemoteLazyLoadinginHibernate</description>
		<content:encoded><![CDATA[<p>See Remote Lazy Loading in Hibernate</p>
<p><a href="http://www.theserverside.com/tt/articles/article.tss?l=RemoteLazyLoadinginHibernate" rel="nofollow">http://www.theserverside.com/tt/articles/article.tss?l=RemoteLazyLoadinginHibernate</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Performance Unit Test; a development concern by odecee blogs &#187; Blog Archive &#187; The Performance Architect</title>
		<link>http://www.odecee.com.au/blogs/?p=22#comment-1797</link>
		<dc:creator>odecee blogs &#187; Blog Archive &#187; The Performance Architect</dc:creator>
		<pubDate>Tue, 12 May 2009 10:45:42 +0000</pubDate>
		<guid isPermaLink="false">http://apollo.odecee.com.au/blogs/?p=22#comment-1797</guid>
		<description>[...] and Component Performance Testing patterns.  An implementation of this pattern is what l call the Performance Unit Test which l discuss in detail in my blog [...]</description>
		<content:encoded><![CDATA[<p>[...] and Component Performance Testing patterns.  An implementation of this pattern is what l call the Performance Unit Test which l discuss in detail in my blog [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on N+1 has leaked into my service interfaces by odecee blogs &#187; Blog Archive &#187; Simplicity and the Tree Walker</title>
		<link>http://www.odecee.com.au/blogs/?p=16#comment-837</link>
		<dc:creator>odecee blogs &#187; Blog Archive &#187; Simplicity and the Tree Walker</dc:creator>
		<pubDate>Fri, 10 Oct 2008 23:50:45 +0000</pubDate>
		<guid isPermaLink="false">http://apollo.odecee.com.au/blogs/?p=16#comment-837</guid>
		<description>[...] list would still hold strong. This made me think about my previous blog entry on a similar topic; N+1 has leaked into my service interfaces. Does the Tree Walker stop the ORM constraints from leaking in to my service interfaces? Well, no. [...]</description>
		<content:encoded><![CDATA[<p>[...] list would still hold strong. This made me think about my previous blog entry on a similar topic; N+1 has leaked into my service interfaces. Does the Tree Walker stop the ORM constraints from leaking in to my service interfaces? Well, no. [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Performance Unit Test; a development concern by Charlie Weiblen</title>
		<link>http://www.odecee.com.au/blogs/?p=22#comment-344</link>
		<dc:creator>Charlie Weiblen</dc:creator>
		<pubDate>Mon, 17 Mar 2008 12:23:15 +0000</pubDate>
		<guid isPermaLink="false">http://apollo.odecee.com.au/blogs/?p=22#comment-344</guid>
		<description>As a performance engineer, "I feel performance is too often neglected by the development team!"

The Performance Unit Test concept is interesting to me... but I foresee problems where it is difficult to specify explicit performance criteria for a unit test.  Without meaningful and specific pass/fail criteria, a unit test is just a to-do item to check off a list.  Often, it can be difficult to get SLAs defined for the end-to-end application/transaction/etc.  How did you go about determining SLAs for low-level performance unit tests?

I think it would be more practical and lower effort (cost) to implement performance integration tests, ideally as part of a continuous integration build process.</description>
		<content:encoded><![CDATA[<p>As a performance engineer, &#8220;I feel performance is too often neglected by the development team!&#8221;</p>
<p>The Performance Unit Test concept is interesting to me&#8230; but I foresee problems where it is difficult to specify explicit performance criteria for a unit test.  Without meaningful and specific pass/fail criteria, a unit test is just a to-do item to check off a list.  Often, it can be difficult to get SLAs defined for the end-to-end application/transaction/etc.  How did you go about determining SLAs for low-level performance unit tests?</p>
<p>I think it would be more practical and lower effort (cost) to implement performance integration tests, ideally as part of a continuous integration build process.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Optimise your Web Tier using Sematic Markup by Greg Foster</title>
		<link>http://www.odecee.com.au/blogs/?p=17#comment-92</link>
		<dc:creator>Greg Foster</dc:creator>
		<pubDate>Wed, 06 Feb 2008 07:19:59 +0000</pubDate>
		<guid isPermaLink="false">http://apollo.odecee.com.au/blogs/?p=17#comment-92</guid>
		<description>Nicely described. My project had to "re-skin" the our application and it turned out to be an almost complete re-write of the front end.  I also employed something very similar to what you have described using Velocity which l had lots of success with.

One thing that l did notice though... the JavaScript libraries were always a little hard to follow.  For example, when we added event handlers to our HTML elements, we added methods to a JavaScript file i.e. they were not defined in the markup (HTML).  Did you have a solution to this problem?</description>
		<content:encoded><![CDATA[<p>Nicely described. My project had to &#8220;re-skin&#8221; the our application and it turned out to be an almost complete re-write of the front end.  I also employed something very similar to what you have described using Velocity which l had lots of success with.</p>
<p>One thing that l did notice though&#8230; the JavaScript libraries were always a little hard to follow.  For example, when we added event handlers to our HTML elements, we added methods to a JavaScript file i.e. they were not defined in the markup (HTML).  Did you have a solution to this problem?</p>
]]></content:encoded>
	</item>
</channel>
</rss>
