Sphinx, MySQL

Sphinx is live and kicking

2008-02-24 | Comment?

Sphinx logoAs I have mentioned in my previous post, I have started using Sphinx as a search engine on my site. It was a very exciting week for my site with a record breaking number of visitors. Sphinx has handled it great. I must say that MySQL full text search have done a great job too. There was some slow queries before, but after adding 1GB of RAM memory couple months ago, I really haven’t experienced any major problems with MySQL full text search. But I have realized that I would have to switch to some better search engine sooner or later.

Sphinx has replaced MySQL full text search successfully. Search is very fast and the CPU usage is lower even with the increasing number of visits. Not only that I don’t use MySQL fulltext for searching, but the removing full text indexes surely helped the CPU, since it doesn’t have to constantly update fulltext indexes on each update. Instead, I have implemented Sphinx main+delta scheme for re-indexing. Main index is re-indexed nightly while the delta index is rebuilt every 15 minutes.

So far, so good…

Solr, Sphinx, Lucene, PHP, API, MySQL

Sphinx and Lucene search engines, first impressions

2008-02-15 | 2 Comments

MySQL full text search is the easiest way to provide full text functionality on your site. It is easy to implement and easy to use. You just have to specify MySQL FULLTEXT index, and after that you search the tables with appropriate SQL queries.

However, there are some disadvantages. On a large databases full text search might perform very slow. Also, MySQL full text indexes are available only on MyISAM tables, so if you are using InnoDB table engine to store critical data, you won’t be able to use MySQL full text search.

In order to scale the search of large data sets you will have to look for the solution beyond MySQL. Here I will write about my first thoughts on two popular open source search engines, Lucene and Sphinx.

I will not talk about performances and speed. This post presents basic things about installation, configuration and implementation of each search engine, so it might help you to choose between Lucene and Sphinx if you want to switch from MySQL full text search on something better, quickly and easily.

MySQL Lucene Sphinx search comparationDon’t think about performances at this moment, search should be faster than MySQL full text search. I will show you only one slide from this presentation which you can find on MySQL Performance Blog to demonstrate the speed difference between MySQL, Lucene and Sphinx on a 5GB test database (of course, results may vary depending on your datasets). If you are interested in alternative full text search engines, you should take a look on the entire presentation (38 slides, PDF)

Lucene

Lucene logoLucene is actually a Java library, so basically you don’t have to install anything to use Lucene. It is a very powerful tool for indexing and searching documents. Lucene doesn’t care do you index files, database records or something else. It is your duty to prepare data and send it to Lucene index as Document object. That means that you will need some extra work to index your database records, or whatever data you want to search. Once created, index can be easily updated or expanded by adding/updating documents from your scripts and programs.

Lots of popular heavy traffic sites are powered by Lucene search like Wikipedia or Digg, so the performance and scalability is not under the question. Lucene is also rich with search options providing boolean search, phrase search, wildcard search, field search and other features.

Implementation around Lucene might be a little bit problematic if you are not using Java. If your project is in PHP, like mine, you can use something like PHP-Java bridge. There is also a PHP Lucene class in Zend framework. I haven’t tried the class, but I have read here that indexing and searching performances are very poor comparing to the original Lucene Java class. It might not be the truth, but you should have that in your mind.

Solr is perhaps the best way to use Lucene search on any platform. Solr is also a Java application and you have to install it on some servlet container, like Tomcat. Once installed, you can access Solr from any development platform using REST protocol. You send add/update XML requests to the update URL (for example http://localhost:8080/solr/update) to index documents.

<add>
<doc>
<field name="id">05991</field>
<field name="description">This is a blog for webmasters and developers</field>
</doc>
</add>

On search, you send GET queries to the Solr select URL (for example http://localhost:8080/solr/select?q=test) and you will receive XML response from the Solr containing structured information with search results. Cool, right? Here you can find the list of very popular web sites that use Solr for search.

Sphinx

Sphinx logoSphinx is a full text search engine and it is ready-made to be used with open source databases like MySQL (MyISAM, InnoDB) and PostgreSql. It is very easy for installation. Configuration is simple as well. Basically, you will specify the SQL connection parameters and the SQL query to fetch the data from your database, and the Sphinx indexer will create the index for your database.

Indexing is very fast, but the problem is that there is no way to update index partially, for example to update/add only couple of records. Instead, you have to re-index an entire dataset from the beginning. However, there is a proposed solution that can be implemented automatically. You create a main index and delta index with recently added/updated records. You regularly re-index only delta index, while you re-index main only periodically. As far as I can see, this works very good in practice once you get it right, at least for my 1GB table, especially because, as I already sad, indexing is very fast. In my case, 5000+ documents delta index is re-indexed in a split second.

All major programming languages are supported, including PHP, Python, Java, Ruby and C++. Sphinx supports features like boolean search, phrase search, field search… It doesn’t support wildcards search, but I’m not crazy about that feature.

My pick

I have tried Lucene with Solr. It needs more coding to index data and an extra work to install and configure Tomcat and Solr. Although everything worked fine on my desktop, I had trouble installing Tomcat on the production server so I have given up Lucene and Solr for now.

On the other hand, Sphinx is very easy to install and configure. If you are switching from MySQL full text search, you only need to adjust your scripts to use sphinx API in combination with SQL instead of MySQL full text search query. I still haven’t launched the Sphinx search on the production server, but the test system is working fine. Basically, Sphinx has all the features I need, I only hope that they will add some live updates and/or partial re-indexing soon. :(

It is up to you. If you are not using Java for your project and you don’t want to spend extra time installing and configuring php-java-bridge or Tomcat/Solr, Sphinx might be the better and instant solution, especially if you are migrating from MySQL full text search. However, if your data sets are frequently updated, you will have to seriously consider can you organize partial indexing with Sphinx main+delta indexing scheme.

Lucene however is still known as the most advanced open source search engine with lots of search options, live updates and other things, so if you need all those functionalities, it might be better to go straight to Lucene and Solr.

Amazon Web Services, Facebook, API

Building Facebook applications on Amazon Web Services (AWS)

2008-02-05 | Comment?

Recently I have written about my first impressions on Facebook platform. Amazon Web Services today announced that they are teaming up with Facebook developers team to provide easy way to build instantly scalable Facebook applications on Amazon Web Services, S3 Simple Storage Service and EC2 Elastic Compute Cloud. According to Amazon, you can use their web services to build scalable Facebook applications which will ensure reliability, flexibility, and cost-effectiveness while the popularity of your Facebook applications grows.

I haven’t checked this in details, but if you are familiar with AWS and Facebook platform, you can take a look on this EC2 hosting tutorial, and hello world example. If you have some time, send your comments about your impressions on AWS and Facebook platform…

Domains

Domain tasting company have bought the domain in the front of your nose, how to prevent and react

2008-01-27 | Comment?

It is a good timing for a post about domain tasting since it was recently reported that Google is considering to stop serving AdSense ads on tasted domains. This is a good news, but Google AdSense is not the only revenue source for the domain tasting companies which means that you can still become a victim (yes, a victim!) of domain tasting.

How does it works

I have read about domain tasting for the first time on Bob Parson’s blog (GoDaddy CEO) and you can also check Wikipedia entry for more details. In brief:

  • Domain tasting companies abuse the 5 days grace period when purchasing domains with some registrants.
  • They buy domains and put ads on it.
  • After 5 days they use 5 days money back clause to cancel the domain.
  • Even if they had only one ad click in that period that earnd them $0.10 it is a pure profit since domain cost them $0 in 5-days period.
  • Multiply this with tens of thousands domains registered per day and you have a very profitable business (to be more accurate, according to Bob, 32.000.000 domains have been tasted back in May 2006! 32M in one single moth!?!).
  • Sometimes, they might hit the “jack pot”. For example, domain have earned whole $1 in 5 days. Domain tasters estimates that it will gain profit from that domain over the year so they usually keep those domains.

And they have registered the domain in the front of your nose?

How do the tasting companies decide which domains to purchase. Sometimes randomly perhaps or by trying different words combinations from the English dictionary. They also buy expired domains by default. But lots of people experienced that domain was registered in the front of their nose. One day you check the domain name and it is available. You come back tomorrow to buy it and the domain is sold?!?! And that doesn’t happen by accident, since the people who were searching for specific non-English word domains also experienced the problem. More likely is that you have some spyware installed on your computer.

Not so likely, you have some really evil spyware that track your searches on GoDaddy or any other domain registrants.

More likely
, some extension that you are using is selling your anonymously data to the domain tasters. If you visit some domain from your browser and the domain doesn’t exist, browser extension can detect that and send the data back to the server. Domain tasters would kill for that information, unregistered domain name that someone has accidentally visited is what they are looking for. Various companies buys anonymusly statistic data directly from extension developers, so watch it! Check the privacy policies of the extensions you are using! Even some useful browser extension might collect and sell anonymously traffic data to other companies. Usually, that kind of things won’t heart you, but in this case it might cost you to lose your desirable domain name.

How to prevent

Now when we do know how they are finding all those domain names to buy, it is easier to prevent it.

  • It might be a good idea to search for available domains by using whois from command line. :)
  • Check privacy policies of the extensions you are using.
  • If you have lots of browser extensions installed, use some other browser for domain searching. For example, if you are using Firefox, install Opera only for this purpose.
  • If you find a good domain name, buy it immediately even if you are not sure is that what you are looking for! Domains are cheap today. If you decide to pick another domain name for your site, you can easily sell that domain later on a domain after market and possibly even earn a few dollars.

And how to react

But if it is already to late, and domain tasters have taken the domain in the front of your nose, don’t panic. The good thing is that you are now aware of domain tasting business model and how does it works.

  • Check the whois for the domain and see who have registered the domain and when.
  • Search for more info on the company that have registered your domain. Do a little Google research and you will find in which context the company is mentioned on forums and blogs. For example, when the friend of mine experienced the domain tasting problem, we have searched for the company and found this and similar posts. In that moment we knew that domain tasting company have bought the domain.
  • If you realize that domain tasting company have bought your domain, there is a great chance that domain will be available again in 5 days. But, you have to be COOL!
  • Don’t visit the domain in the next 5 days and even if you do by accident, don’t click on anything. Don’t generate any traffic or profit to them since domain tasting company might decide to keep your domain.
  • Don’t tell anyone! Don’t write to your friend over IM that someone grab the xyz.com in the front of your nose since your friend will probably click on the link to see the homepage. And of course, don’t blog or twitter about it as well. :) Just sit and wait.
  • If you stay cool and patient, your domain will probably become available again in 5 days so go register it as quickly as possible! :)

Smarty, PHP

Using Smarty PHP template engine

2008-01-24 | Comment?

I believe that everyone have started to use PHP by adding some dynamic content to static HTML pages. Usually you would just add <?php ?> inside your HTML code. But over the time, if you add more and more PHP code, database support, etc. your scripts might become really messy, which is not good. Also, over the time PHP evolved from Hypertext Preprocessor in serious programming language which caused a different approach to PHP sites/applications. It is a bad practice these days to mix your PHP and HTML, logic and presentation. Instead you should try to distinguish your PHP and logic from the presentation (HTML and PHP).

You can use PHP for that as well. Separate your logic and presenation in two groups of PHP scripts. You would do all calculation and data preparation in the first group of scripts and then you will present the data with other group of scripts which would usually contain HTML code with the addition of dynamically generated content by printing it with <?php echo $my_var;?>. But if you have ever edited Word Press theme, you will see that HTML code becomes unreadable with the heavy usage of the <?php ?> elements inside.

That is where Smarty comes. You can write very clean HTML templates by using Smarty variables and Smarty commands. Smarty will also make the intermediate PHP code from your template that looks alike Word Press themes I have just mentioned, but you don’t have to worry about that. All you have to do is to make the pure HTML templates. Intermediate PHP code will be cached so your performances won’t be decreased.

Smarty is very simple tool. You create the object, assign values to Smarty variables and on the end of the script, you will call the display method to publish the page.

$smarty->assign('title', 'Hello, World');
$smarty->assign('message', 'some text');
$smarty->display('welcome.tpl');

Why I really like Smarty is that you can easily present the data in many different formats. For example, I use the same code base to retrieve data form the database and prepare objects and then use the different smarty templates to publish content for the different platforms.

Using Smarty

Lets take a look on the diagram above. Various scripts use my base classes to retrieve data from the databases. Once formated and prepared for publishing, by using different Smarty templates, the same data sets can be used to produce various content like web page, rss feed, content for mobile devices or even FBML for Facebook application.

PHP, eAccelerator

eAccelerator

2008-01-15 | Comment?

Problem with script languages is that the scripts have to be parsed and compiled each time when the script is executed. Each time when you visit some PHP page, your server must parse and compile the script before executing. Can you imagine how much CPU time would be saved if those scripts are parsed and compiled only once, and then executed multiple times? Well, you don’t have to imagine because softwares, like eAccelerator, do exactly that.

eAccelerator is a free open-source PHP accelerator, optimizer, and dynamic content cache. It increases the performance of PHP scripts by caching them in their compiled state, so that the overhead of compiling is almost completely eliminated. It also optimizes scripts to speed up their execution. eAccelerator typically reduces server load and increases the speed of your PHP code by 1-10 times.

Installation

If you have the dedicated server, and you seek the way to improve your server performances, I would highly recommended you to install the eAccellerator. Although I’m not very experienced Linux admin, I had no problems with the installation by following the installation from source guide. Before compilation check all compile parameters if you want to enable additional functions like eaccelerator_put and eaccelerator_get (I will write about using shared memory functions in some other post). If you are running Windows machine at home, here you can find eAccelerator Windows binaries.

Configuration

You can find details about all eAccelerator configuration variables here. Probably the most important variable is the eaccelerator.shm_size which represents the size of the memory dedicated to the eAccelerator. How much memory you have to dedicate to the eAccelerator? Well, with the eAccelerator you will also get the control.php that you should copy on your web server. By navigating your browser to the control.php you will see the eAccelerator memory usage and other things on the web page. Set some starting shm_size value and monitor this page first few days and then see should you increase or decrease memory size for eAccelerator. From this control panel you can also easily clear the eAccelerator cache from the memory.

eAccelerator Control Panel

If eAccelerator is unable to cache the script in memory, it will by default use file system for the caching by saving cached content inside the directory specified with eaccelerator.cache_dir. If you want to use only memory for the caching, set eaccelerator.shm_only to 1. Also check the eaccelerator.shm_ttl and eaccelerator.shm_prune_period. This variables tells eAccelerator which old scripts to remove from memory if there is not enough shared memory to cache a new script.

Facebook, API

Facebook platform

2008-01-08 | 3 Comments

facebook platform logoRecently I have played a little bit with the Facebook platform and Facebook application development. My aim was not to become Facebook expert, I just wanted to see what they have to offer and how quickly I can assemble a simple Facebook application.

Facebook platform contains several key components:

Interface (API) is the set of methods which you can use to access user data, friends lists, groups, photos etc. There are also data methods in beta which allows you to use Facebook as storage for your applications. Basically, you access API through REST interface, but you should download and use existing Facebook client libraries. Client libraries exists for all popular web programming languages. So instead of building REST request, sending it to the Facebook server and receiving and parsing the results, you will just call the predefined method from the Facebook library (php example: $facebook->api_client->friends_list) and you will get the results in array or some other data type, depending of method you have just called.

FBML is the Facebook extension of the XHTML which allows you to put standard Facebook components on your application canvas.

FBJS is the Facebook Javascript. Although regular Javascript will usually work, Facebook developed some extended Javascript objects and methods which help your application to be executed in its own name space among other Facebook apps.

FQL is Facebook query language. As far as I have seen, you can obtain similar data with FQL like you can with the API, but in this case you use SQL syntax to retrieve friends list and other stuff.

Getting started

After you check the anatomy of Facebook applications it is the best practice to start your own hello world application by following the step by step guide. Very soon you will have your own Hello World application and after you have learned basic stuff it is up to you to upgrade your application and build anything you want on top of it.

In brief, your application is hosted on your web site and hosting. You put your application in, for example http://yourdomain.com/myfacebookapp/ and that will be your „callback URL“. Any access to the http://apps.facebook.com/yourappname/somepage.php will cause Facebook to fetch the page from http://yourdomain.com/myfacebookapp/somepage.php and display the script’s output inside your application canvas. Of course, Facebook will not just display the output as-is. It will first transform your FMBL in regular XHTML by replacing FMBL tags with HTML elements with appropriate css class names. Application will be put in its own name space, for example your <div id=’header’> will become <div id=’app688273534_header’>. Similar thing will happen with Javascript (FBJS) and styles that reference that element using it’s id. For example #header {color: #c0c0c0;} will become #app688273534_header {color: #c0c0c0;} Oh, and by the way, you can’t link your external css files, you have to embed styles with <style> tag inside your FMBL page.

First impressions

Building Facebook apps is not a rocket science. Basically you just build web pages as usual but you use some additional FMBL elements. At least that was the idea. Unfortunately, you will not find FMBL elements for some common Facebook components that you can see in Facebook apps, so you will be forced to create your own divs and spans and assign Facebook css classes or even create your own css classes in Facebook style to get Facebook look and feel. On the other hand, some existing FMBL elements are not customizable enough, and again, you will have to create your own HTML to imitate look & feel of Facebook apps.

For example, this code will produce the common header menu for your application.

<fb:dashboard>
<fb:action href="new.php">Create a new photo album</fb:action>
<fb:action href="you.php">Photos of You</fb:action>
</fb:dashboard>

facebook dashboard

As you can see, it will also include your application name. Lets say that you don’t want your application name to be displayed here, you want to put your content just below the menu or you want to put your logo instead of name, you can’t do that because <fb:dashboard> will by default print your application name and you only have the property to turn application icon on or off on the left side of your application name. Anyway, you will have to make your own header menu without using <fb:dashboard> to avoid application name appearance.

I hope that Facebook will make lots of this fb elements more customizable and that they will offer even more elements, some of them are already in betas. Since I’m not the only one that thinks that the list of fb elements should be expanded, there is a place where you can suggest and request new fbml elements.

Web search

Keep the visitors on your site and reduce the bounce rate

2007-12-18 | Comment?

My news aggregation site receives a lot of traffic from Google, which is great of course. Most of the hits from Google comes from the long tail for various keywords and usually don’t hit the homepage or some of the main categories but some secondary pages. Unfortunately, lots of users from those hits don’t hang up on the site long enough. They find the article they want and proceed to the other site to read the article.

Although the user base of my regular visitors are growing, I wanted to tease even more new visitors to explore the site little bit more before they leave it. I have introduced a few tweaks on my secondary pages that shows up to the visitors referred by the Google search.

New window

The first thing I have tried is to put target=“_blank“ on all external links if the visitor is referred to the site from Google Search. Usually I let external links open in the same window, but if the user is referred from Google Search, I don’t want just to push the user on the other site and let him forget about my website. The user will at least see your site once more, when he returns to close the window/tab, but if you offer quality content and service, it might also explore your site.

Site introduction, link to homepage

I have offered visible color box with welcome message explaining my site’s niche. So perhaps the visitors just hit the page searching for some gossip, but I want to let him know that I offer great news aggregation on my home page and that he should visit my homepage to see it. I’m not afraid to make these box clearly visible and colored since it will be visible only to visitors that are coming from Google search, not to my regular visitors.

Using Google keywords to provide search on your own site

The last, and most efficient method I have used is to use keywords that visitor have searched for on Google and offer the visitor a search on my site for the same keywords. All this information are contained in the referrer url, you just have to extract them. So if the users searched for paris hilton and ended up on some page of your site, you can offer him to search all of your content for the keywords paris hilton.

If you have configured some AdWords campaign, you know that Google suggest you to use the keywords you have targeted in the title of your ad. Same principle works here, when the visitor sees the keywords he just typed, there is much more possibility that the visitor will click on the link. If you have any kind of search on your site/blog you can implement the same method.

Conclusion and results

Visits on my site are growing constantly and because of this I can’t measure the exact effects of these methods. Although the bounce rate is still the same or 1-2% reduced, the number of visitors in total is growing so I’m satisfied with the performance.

However, I do count the clicks on links that I have described in sections two and tree. These links record over 300 clicks daily. That might be 300 new potential regular users every day, and that is great.

Caching, MySQL

How to improve MySQL query cache hit ratio

2007-12-13 | Comment?

MySQL logoI have already mentioned one of the downside of MySQL query cache, query invalidation. Each time when you insert/update/delete records from the table, cached queries from that table will be invalidated even if the edited records are not in the cached subset.

The simple way to improve MySQL query cache hit ratio is not to update tables frequently :) That is of course impossible in most of occasions, but perhaps you can make that effect by using one table for read-only purposes.

I own news aggregator similar to Google News and Topix, but localized for my country. Database is updated constantly, crawler fetch news feeds and insert them into the database etc. Although there is a huge searchable archive of news article links, ~70% of traffic is targeted on homepage and news categories which contains daily news. Improving performance of daily news presentation would boost server performance, since that would be, as I said, ~70% of total traffic.

I have realized that for my site it is not a big problem if the new article doesn’t came up on the home page on the same minute when it was inserted in the database, so I have chosen performance over data consistency.

Queries that I usually use have couple of joins. To keep this post readable, I will present you simplified queries:

SELECT a.article_id, a.title, a.date, c.name, s.name
FROM articles a
INNER JOIN categories c ON (a.category_id=c.category_id)
INNER JOIN sources s ON (a.source_id=s.source_id)
WHERE a.date=CURDATE();

So, what I have done is to „cache“ daily news in one separated table, including all joined data. Ids and other staff are indexed in the daily_cache table.

INSERT INTO daily_cache (a.article_id, a.title, a.date, a.category, a.source) VALUES
SELECT a.article_id, a.title, a.date, c.name, s.name
FROM articles a
INNER JOIN categories c ON (a.category_id=c.category_id)
INNER JOIN sources s ON (a.source_id=s.source_id)
WHERE a.date=CURDATE();

After that, all queries that fetch daily news are directed to daily_cache table:

SELECT a.article_id, a.title, a.date, c.name, s.name FROM daily_cache;
or
SELECT a.article_id, a.title, a.date, c.name, s.name FROM daily_cache WHERE category_id=1;
etc.

What have I achieved with this:

  • MySQL and OS server would probably keep daily_cache table in working memory since the table is not so big and it is frequently used. I’m using regular MyISAM table type, but you can also consider MEMORY table type.
  • We have a flat records in daily_cache table with all necessary informations for displaying, so the performance will not be decreased because of joins.
  • daily_cache is updated on every 30 minutes (complete truncate and insert with rotation, so we never display data from an empty table) which means that MySQL query cache will have an easy job of keeping frequently used queries in cache for 30 minutes.
  • As I said, daily news pages get around 70% of total traffic, so this method significantly reduced server’s CPU usage and improved performances.

Conclusion

This is not the universal way of improving performance of your database server, but it might be useful in some occasions. Your data might be to critical not to be displayed on the homepage for 30 minutes. Or you have lots of data so the inserting records in cache table and index updating would take ages, etc.

I do believe that there is no universal solutions for every problem. For example, MySQL query cache tries to provide an universal solution, it will cache all your queries etc. and we have seen that it might not be the good idea and in many cases it won’t improve performances significantly.

It is up to you to make custom caching solution that will suits your needs. But, by reading other peoples case studies you can just get the better idea how can you improve your web site performance.

Caching, MySQL

MySQL query cache

2007-12-09 | 1 Comment

On the Internet, caching is everything. Domain name servers cache your DNS records, proxies and browsers tries to cache your content, your’s server’s OS cache frequently used files into the memory… Although these are all different types of caching, they all have one thing in common, they improves overall performance of the system.

If you want to improve performance of your web site and you seek for the solution on blogs or forums, suddenly you will see keyword “caching” popping up all over you. I guess that the first stage of “caching” on any database driven web site would be to enable database query cache and see how does it perform.

Configuration

MySQL logo MySQL supports query caching since version 4.x. It caches query results into the servers memory so the next time when you execute the same query, results will be served directly from the memory and MySQL doesn’t have to search for your records in the database again.

Key parameters that you should add to your my.ini file to enable query cache are:
query_cache_type=1
query_cache_size=64M
query_cache_limit=5M

  • query_cache_type (0) cache is disabled; (1) cache is enabled for all queries except the ones that contain SQL_NO_CACHE keyword; (2) cache is enabled on demand, it will cache only queries which contain SQL_CACHE keyword.
  • query_cache_size specify the dedicated memory which will be used to store queries, 64 megabytes in this case
  • query_cache_limit specify the maximum size of query result to be stored into the cache; if you have specified 5M limit and you call SELECT * FROM table which retrieves six megabytes of data, the result won’t be cached into the memory.

You can see your current query cache settings by typing the following query:

mysql> SHOW VARIABLES LIKE '%query_cache%';
+------------------------------+----------+
| Variable_name | Value |
+------------------------------+----------+
| have_query_cache | YES |
| query_cache_limit | 5242880 |
| query_cache_min_res_unit | 4096 |
| query_cache_size | 67108864 |
| query_cache_type | ON |
| query_cache_wlock_invalidate | OFF |
+------------------------------+----------+
6 rows in set (0.00 sec)

Practice

First lets try one query with query cache disabled. We can use SET GLOBAL to set the variable value during runtime, without editing my.ini and restarting the MySQL. I will perform some full text search with sorting on indexed field:

mysql> SET GLOBAL query_cache_type=0;

mysql> SELECT item_id, date FROM items
WHERE MATCH (name) AGAINST ('notebook' IN BOOLEAN MODE)
ORDER BY date DESC LIMIT 5;
+---------+------------+
| item_id | date |
+---------+------------+
| 212098 | 2007-12-04 |
| 212108 | 2007-12-04 |
| 211738 | 2007-08-04 |
| 211633 | 2007-05-11 |
+---------+------------+
5 rows in set (2.78 sec)

Now lets execute the same query again:

mysql> SELECT item_id, date FROM items
WHERE MATCH (name) AGAINST ('notebook' IN BOOLEAN MODE)
ORDER BY date DESC LIMIT 5;
+---------+------------+
| item_id | date |
+---------+------------+
| 212098 | 2007-12-04 |
| 212108 | 2007-12-04 |
| 211738 | 2007-08-04 |
| 211633 | 2007-05-11 |
| 211624 | 2007-05-11 |
+---------+------------+
5 rows in set (0.15 sec)

You might be deceived with this query time even without caching. Basically, your OS have cached some chunks of data into the working memory, so the MySQL accessed data much faster and provided you the results very quickly. But this is just my desktop machine, not the real production web server, and you don’t know for granted how the OS would handle your table files.

Now lets enable query cache again and repeat the query:

mysql> SET GLOBAL query_cache_type=1;
mysql> SELECT item_id, date FROM items
WHERE MATCH (name) AGAINST ('notebook' IN BOOLEAN MODE)
ORDER BY date DESC LIMIT 5;
+---------+------------+
| item_id | date |
+---------+------------+
| 212098 | 2007-12-04 |
| 212108 | 2007-12-04 |
| 211738 | 2007-08-04 |
| 211633 | 2007-05-11 |
| 211624 | 2007-05-11 |
+---------+------------+
5 rows in set (0.11 sec)

This time the query is executed in 0.11 seconds, but now it is cached for further use which you can see by executing:

mysql> SHOW STATUS LIKE 'qcache%';
+-------------------------+---------+
| Variable_name | Value |
+-------------------------+---------+
| Qcache_free_blocks | 1 |
| Qcache_free_memory | 5232592 |
| Qcache_hits | 0 |
| Qcache_inserts | 1 |
| Qcache_lowmem_prunes | 0 |
| Qcache_not_cached | 7 |
| Qcache_queries_in_cache | 1 |
| Qcache_total_blocks | 4 |

+-------------------------+---------+

So the next time when we execute the same query…
+---------+------------+
| item_id | date |
+---------+------------+
| 212098 | 2007-12-04 |
| 212108 | 2007-12-04 |
| 211738 | 2007-08-04 |
| 211633 | 2007-05-11 |
| 211624 | 2007-05-11 |
+---------+------------+
5 rows in set (0.00 sec)

…we will get the same results in 0.00 seconds, and each time after it will be the same since the result is served directly from the cache. Lets check the Qcache status again and we will see that now we have one hit on our query cache.

mysql> SHOW STATUS LIKE 'qcache%';
+-------------------------+---------+
| Variable_name | Value |
+-------------------------+---------+
| Qcache_free_blocks | 1 |
| Qcache_free_memory | 5232592 |
| Qcache_hits | 1 |
| Qcache_inserts | 1 |
| Qcache_lowmem_prunes | 0 |
| Qcache_not_cached | 7 |
| Qcache_queries_in_cache | 1 |
| Qcache_total_blocks | 4 |

+-------------------------+---------+

Lets explain some of the Qcache status variables:

  • Qcache_not_cached display the number of the queries which are not cached which also includes queries like USE, UPDATE, INSERT, etc.
  • qcache_lowmem_prunes is the number of queries which are removed from the cache due to the lack of available memory for query cache. If there is not enough memory for some query to be cached, MySQL will try to keep most frequent used queries in the cache and it will remove the ones that haven’t been used lately to free the memory.
  • Qcache_hits is ofcourse the number of times that cached results have been used instead of executing full query, and you can use it as one of the parameter to determine the efficiency of query cache for your project.

Limitations and exceptions

MySQL query is a little bit dummy. It does not analyze your query to much, it will just use your entire query as the key to access cached results. Pay attention to the letter casing and spacing in your queries, for example, following queries will be treated as different and each query result will be cached separately, although those queries are the same.

SELECT * FROM table;
select * from table;

Anyway, experienced developers that already use some coding and naming standard won’t have any problem with this obstacle.

Also, keep in mind that your query have to provide the same result each time to be cached, so the queries which use functions like RAND(), CURDATE() etc. will not be cached.

Query invalidation

Query will be deleted from cache and invalidated each time when the one of the tables from query change. MySQL won’t analyze cached records and weather some of the records have been changed, no. Any update/insert of the table will cause invalidation of already cached queries next time.

Lets take a look on this example:

SELECT * FROM categories WHERE category_id < 15

This query will be cached, it will cache first fifteen records from the table categories.

UPDATE categories SET name='Games' WHERE category_id=16;

Although it is obliviously that none of the records from previous select query is not changed with this update query since we have changed the record with id 16, previous query will be invalidated since the table categories have been updated.

This is one of the great disadvantages of MySQL query cache specially if you frequently update and insert records. However, in some cases you can overcome this obstacle and I will show you one example in some of the posts to come.

Here you can find the great post about other limitations of the MySQL query cache and MySQL query cache efficiency.

Conclusion

You can enable MySQL query cache and see how does it performs in your case. Although it is not the most efficient way of caching and does have some disadvantages, you can easily turn it on and off without changing your source code so it might be the easiest way to improve your web site performance.

You can find out more about query cache in MySQL documentation.

« Previous Entries