<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>coder . cl &#187; sql</title>
	<atom:link href="http://coder.cl/category/programming/sql/feed/" rel="self" type="application/rss+xml" />
	<link>http://coder.cl</link>
	<description>web developer &#38; system programmer</description>
	<lastBuildDate>Sat, 04 Feb 2012 12:07:01 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>query optimizations, quick recipes</title>
		<link>http://coder.cl/2010/07/query-optimizations-quick-recipes/</link>
		<comments>http://coder.cl/2010/07/query-optimizations-quick-recipes/#comments</comments>
		<pubDate>Fri, 02 Jul 2010 13:13:24 +0000</pubDate>
		<dc:creator>Daniel Molina Wegener</dc:creator>
				<category><![CDATA[programming]]></category>
		<category><![CDATA[sql]]></category>
		<category><![CDATA[tips]]></category>

		<guid isPermaLink="false">http://coder.cl/?p=694</guid>
		<description><![CDATA[Optimizing SQL queries is quite simple task, you just need to follow some rules on how you sort the filtering criteria and how you use your select statements. I will try to explain how you can optimize your queries in this post. named columns Use named columns, instead of using the * wildcard. For example: [...]]]></description>
			<content:encoded><![CDATA[<p>Optimizing SQL queries is quite simple task, you just need to follow some rules on how you sort the filtering criteria and how you use your select statements. I will try to explain how you can optimize your queries in this post.</p>
<p><span id="more-694"></span></p>
<p></p>
<h3>named columns</h3>
<p>Use named columns, instead of using the <strong>*</strong> wildcard. For example:</p>
<pre class='brush: sql;'>
select
    t.*
from
    tbl as t
where
    t.tab_date > current_date - interval 1 month
order by
    t.tab_date desc;
</pre>
<p>Will be more slower than:</p>
<pre class='brush: sql;'>
select
    t.tab_id,
    t.tab_state,
    t.tab_date,
    t.tab_name
from
    tbl as t
where
    t.tab_date > current_date - interval 1 month
order by
    t.tab_date desc
</pre>
<p></p>
<h3>join filtering</h3>
<p>If you filter on join, your query will be faster, for example:</p>
<pre class='brush: sql;'>
select
    t1.tab_id,
    coalesce(t2.tab_date, current_date) as tab_date,
    t1.tab_name
from
    table1 as t1
        left join table2 as t2 on (t1.tab_id = t2.tab_tab_id)
where
    t2.tab_date IS NULL or t2.tab_date > current_date - interval 1 month;
</pre>
<p>Will be more slower than</p>
<pre class='brush: sql;'>
select
    t1.tab_id,
    coalesce(t2.tab_date, current_date) as tab_date,
    t1.tab_name
from
    table1 as t1
        left join table2 as t2 on (t1.tab_id = t2.tab_tab_id
            and t2.tab_date > current_date - interval 1 month);
</pre>
<p></p>
<h3>smaller set first</h3>
<p>The smaller set must selected first, for example:</p>
<pre class='brush: sql;'>
select
    t1.tab_id,
    coalesce(t2.tab_date, current_date) as tab_date,
    t1.tab_name
from
    table1 as t1
        left join table2 as t2 on (t2.tab_tab_id = t1.tab_id
            and t2.tab_date > current_date - interval 1 month);
</pre>
<p>Will be more slower than:</p>
<pre class='brush: sql;'>
select
    t1.tab_id,
    coalesce(t2.tab_date, current_date) as tab_date,
    t1.tab_name
from
    table1 as t1
        left join table2 as t2 on (t1.tab_id = t2.tab_tab_id
            and t2.tab_date > current_date - interval 1 month);
</pre>
<p>Since <tt>t1</tt> has less rows than <tt>t2</tt>, and <tt>t2</tt> is referencing <tt>t1</tt>. Assuming that <tt>t1</tt> has a smaller set of rows and is the referenced table. This applies to column selection too and where statements.</p>
<p></p>
<h3>notes</h3>
<p>You must measure the time of your queries, so you can have a more precise query. Those rules are not <i>absolute</i>.</p>
<br/><hr height="1px" width="50%" />
<div style='text-align: center !important;'><b>Copyright © 2010 Daniel Molina Wegener</b><br/><b>Atribución-No Comercial-Sin Derivadas 2.0 Chile</b><br/><a target='_new' rel="license" href="http://creativecommons.org/licenses/by-nc-nd/2.0/cl/"><img alt="Creative Commons License" style="border-width:0" src="http://coder.cl.qfl.wpcdn.arcostream.com/cc88x31.png" /></a></div>
<br/><hr height="1px" width="100%" />
<p><small>© Daniel Molina Wegener for <a href="http://coder.cl">coder . cl</a>, 2010. | <a href="http://coder.cl/2010/07/query-optimizations-quick-recipes/">Permalink</a> | <a href="http://coder.cl/2010/07/query-optimizations-quick-recipes/#comments">No comment</a><br/>Post tags: <br/></small></p>
<script type="text/javascript"><!--
google_ad_client = "ca-pub-6234432850133541";
/* main-feed */
google_ad_slot = "0763600725";
google_ad_width = 728;
google_ad_height = 90;
//-->
</script>
<script type="text/javascript"
src="http://pagead2.googlesyndication.com/pagead/show_ads.js">
</script>
]]></content:encoded>
			<wfw:commentRss>http://coder.cl/2010/07/query-optimizations-quick-recipes/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>database usage rules</title>
		<link>http://coder.cl/2010/04/database-usage-rules/</link>
		<comments>http://coder.cl/2010/04/database-usage-rules/#comments</comments>
		<pubDate>Fri, 23 Apr 2010 14:39:55 +0000</pubDate>
		<dc:creator>Daniel Molina Wegener</dc:creator>
				<category><![CDATA[programming]]></category>
		<category><![CDATA[sql]]></category>
		<category><![CDATA[tips]]></category>

		<guid isPermaLink="false">http://coder.cl/?p=683</guid>
		<description><![CDATA[Modelling databases is not a simple task. Many times we found some misspelled words, mixed case words and some horrifying things on many databases of third party systems. My rules are simple to follow, and most of them ensures a well designed database, with special care on how do we query that database, creating reliable [...]]]></description>
			<content:encoded><![CDATA[<p>Modelling databases is not a simple task. Many times we found some misspelled words, mixed case words and some horrifying things on many databases of third party systems. My rules are simple to follow, and most of them ensures a well designed database, with special care on how do we query that database, creating reliable and optimised for speed table design and queries.</p>
<p><span id="more-683"></span></p>
<p><br/></p>
<h3>table design</h3>
<p>You must follow a simple rule, <i>fixed length</i> data types goes first, from the one that fits with <i>CPU register length</i> to those types that have variable length. For example a a simple table design for MySQL could be as follows:</p>
<pre class='brush: sql;'>
CREATE TABLE customer
(
    customer_id bigint(11) NOT NULL AUTO_INCREMENT,
    cus_priority int NOT NULL,
    cus_state int NOT NULL,
    cus_products int NOT NULL,
    cus_date datetime NOT NULL,
    cus_name varchar(100) NOT NULL,
    cus_email varchar(50) NOT NULL,
    PRIMARY KEY USING BTREE (customer_id)
)
ENGINE= InnoDB DEFAULT CHARSET= utf8
</pre>
<p>If you take a look on the column order, fixed length table keys have the first priority, fixed length data types have the second priority and the last priority on the table creation script is for those types with variable length. From the perspective of the RDBMS system that we are using, the table structure is read the first time that we query the table, then each fixed length column is read in sequence as the RDBMS systems reads each row from the disk or memory, hence each non-fixed length column is read consequently each time that a row is handled, reading the column length first and then skipping the column offset on the disk or memory to handle the next column. So, reading non-fixed length columns is hard for the RDBMS.</p>
<p>Imagine those tables created with <i>variable length keys</i>!, those tables are creating a heavy overhead on any kind of table created. So, try to use table keys of fixed length, such as <i>int</i> and <i>bigint</i>. Including those tables which have a fixed length made of <i>char(n)</i>, are creating a heavy overhead, since those tables are using <i>string comparison algorithms</i> instead of <i>memory block</i> comparison, such as comparing integer types, which is more easy to handle.</p>
<p>Using a naming convention matters too. The table above has a lowercase name, lower case table fields and each field has a prefix related to the table. Why does they have a prefix?, while we are querying a table is more comfortable to identify them by the prefix that they have, specially when we are creating joined queries, allowing unique column identifiers.</p>
<p><br/></p>
<h3>query optimisation</h3>
<p>Some older RDBMS systems do not support the usage of the <i>join</i> keyword, so we must create joined queries using the <i>where</i> statement. But nowdays in most RDBMS systems supports the <i>join</i> keyword, allowing more faster queries, so there is a technique allowing us to create joins from the older way of using <i>where</i> statements.</p>
<pre class='brush: sql;'>
select
    pro.product_id,
    pro.pro_name,
    sum(coalesce(pay.pay_price, 0.0)) as payment
from
    product as pro
        left join subscription as sus on (pro.product_id = sus.product_id)
        left join payment as pay on (sus.payment_id = pay.payment_id)
group by
    pro.product_id,
    pro.pro_name
order by
    pro.product_id,
    pro.pro_name
</pre>
<p>The query above will bring us a report about <i>payments</i> related to each product on the database. If you see there is no where statement. Also, examine the join conditionals, the <i>smallest</i> dataset is filtered at the <i>left side</i> of the conditional. The more we filter on the join conditionals, faster is the query result. That query took just <i>0.002</i> seconds. Let&#8217;s see what happens with a non-joined query and using the <i>where</i> statement.</p>
<pre class='brush: sql;'>
select
    pro.product_id,
    pro.pro_name,
    sum(coalesce(pay.pay_price, 0.0)) as payments
from
    product as pro,
    subscription as sus,
    payment as pay
where
    (pro.product_id = sus.product_id and sus.product_id is not null)
    and (pro.product_id = sus.product_id or sus.product_id is null)
    and (sus.payment_id = pay.payment_id or pay.payment_id is null)
group by
    pro.product_id,
    pro.pro_name
order by
    pro.product_id,
    pro.pro_name
</pre>
<p>Using the <i>where</i> statement, the same query took <i>0.012</i> seconds. What happened behind on the RDBMS side? On the first query, row filtering was made while the RDBMS was reading rows from each table, hence on the second query, the selection was made over the complete dataset retrieved from all selection tables. In this case we have an enhancement of <i>600%</i> on performance, on more complex queries the same happens. Also the <i>having</i> clause is evil too, since it process those rows retrieved after the selection process on those rows retrieved on at last.</p>
<pre class='brush: sql;'>

select
    pro.product_id,
    pro.pro_name,
    sum(coalesce(pay.pay_price, 0.0)) as payment
from
    product as pro
        left join subscription as sus on (pro.product_id = sus.product_id)
        left join payment as pay on (sus.payment_id = pay.payment_id)
group by
    pro.product_id,
    pro.pro_name
having
    payment > 200000
order by
    pro.product_id,
    pro.pro_name
</pre>
<p>The <i>having</i> clause can process <i>aggregate columns</i>, but that processing is made at end of the selection process, so you must try to replace that <i>having</i> clause by other join or where statement if you have an option to do that.</p>
<p><br/></p>
<h3>rdbms tuning</h3>
<p>Each RDBMS has its own parameters. For example, MySQL have some parameters which allow you to have entire tables on memory, so most queries to those tables in memory are faster than reading the hard drive. If you have a system that is querying a database concurrently, you should increase the memory limit, allowing the RDBMS to maintain that table on memory. You must research which options fits for better on the RDBMS that you are using, which kind of keys and table format fits better with your needs. For example on MySQL InnoDB table format is not always the best option, since InnoDB require the highest resource usage, over other kind of tables available on MySQL RDBMS.</p>
<br/><hr height="1px" width="50%" />
<div style='text-align: center !important;'><b>Copyright © 2010 Daniel Molina Wegener</b><br/><b>Atribución-No Comercial-Sin Derivadas 2.0 Chile</b><br/><a target='_new' rel="license" href="http://creativecommons.org/licenses/by-nc-nd/2.0/cl/"><img alt="Creative Commons License" style="border-width:0" src="http://coder.cl.qfl.wpcdn.arcostream.com/cc88x31.png" /></a></div>
<br/><hr height="1px" width="100%" />
<p><small>© Daniel Molina Wegener for <a href="http://coder.cl">coder . cl</a>, 2010. | <a href="http://coder.cl/2010/04/database-usage-rules/">Permalink</a> | <a href="http://coder.cl/2010/04/database-usage-rules/#comments">No comment</a><br/>Post tags: <br/></small></p>
<script type="text/javascript"><!--
google_ad_client = "ca-pub-6234432850133541";
/* main-feed */
google_ad_slot = "0763600725";
google_ad_width = 728;
google_ad_height = 90;
//-->
</script>
<script type="text/javascript"
src="http://pagead2.googlesyndication.com/pagead/show_ads.js">
</script>
]]></content:encoded>
			<wfw:commentRss>http://coder.cl/2010/04/database-usage-rules/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
<!-- This Quick Cache file was built for (  coder.cl/category/programming/sql/feed/ ) in 0.30390 seconds, on Feb 7th, 2012 at 4:03 am UTC. -->
<!-- This Quick Cache file will automatically expire ( and be re-built automatically ) on Feb 7th, 2012 at 5:03 am UTC -->
