ThingyMaJig

Thingy Ma Jig is the blog of Nicholas Thompson and contains any useful tips, sites and general blog-stuff which are considered interesting or handy!

Connect

LinkedIn GitHub

Topics

announcement 25 apache 3 Apple 1 bash 8 code 7 cool 30 Days Out 8 Dark Basic Pro 4 design 12 doctor who 1 Drupal 74 E4600 1 EOS 400D 3 firefox 2 Flickr 3 free 21 games 5 geek 38 git 2 GreaseMonkey 1 hardware 7 Homebrew 1 How to 37 humour 5 iphone 1 javascript 1 jquery 1 K800i 6 k850i 4 lighttpd 3 linux 33 mac 9 miscellaneous 4 mobile phone 9 music 4 mysql 8 n73 1 n95 1 New Relic 1 Ogre3D 1 OS X 2 performance 3 photos 10 programming 40 Quicksilver 1 review 19 security 3 SEO 6 software 12 svn 2 technology 4 tip 7 tips 10 tv 3 video 3 vim 7 webdev 2 websites 33 wii 1 windows 1 YADS 10

MySQL: Count occurrences of string

Posted on 17 February 2010 in
tip mysql geek Drupal

Today I needed to quickly find out a list of node's which did not contain a certain token (one we use on PPOnline to insert adverts into articles). There were meant to be 2 of these tokens per article but we had noticed some only had one. Some didn't have any at all!

Now, this might already exist as a neat function somewhere, but I couldn't find it. So I wrote this little query to find the number of occurrences of a substring in a larger block of text.

I did look into using the SUBSTRING() and SUBSTRING_INDEX() commands, however I couldn't see a way of efficiently using them to find any number of occurrences. So I settled on the following…

[adsense:468x60:4496506397]
SET @findme="MySQL";
SELECT
  n.nid, n.title,
  CAST((LENGTH(nr.body) - LENGTH(REPLACE(nr.body, @findme, ""))) / LENGTH(@findme) AS UNSIGNED) AS findme_count
FROM node n
INNER JOIN node_revisions nr ON nr.vid = n.vid
WHERE n.type = "blog"
ORDER BY findme_count DESC;
[adsense:468x60:4496506397]

The above example will list all node's of type blog. It will then compare the length of the body before and after removing all instances of the string you're looking for. The result should be n * LENGTH(find_me) where n is the number of occurrences. This is why we divide by the length of the find_me string.

This is a pretty fast query too. On our live database, it searched over 1,400 nodes in just over 31 milliseconds.

I hope this saves someone some time in the future!