2011-07-30

How to install and implement Sphinx Search on XAMPP for Windows 7 with MySQL and PHP

If you have a MySQL-based website with lots of records (like hundreds of thousands) and run a full-text search on them, you will probably run into performance problems. The first thing most people probably do is use LIKE with the % wildcard. This quickly becomes too slow, of course, because MySQL has to search through the entire text of every record of the table and can’t use any indexes.

The first solution most people think of is MySQL’s own full-text index capability. However, this doesn’t work very well if you run the query on many text fields per record. In my case, the search was even slower.

Sooner or later, you have to think about creating additional full-text indexes. At a seminar once I heard about how powerful the Apache Lucene project is, so that’s the first thing I looked into. However, I quickly found out that it’s very complicated and requires a larger learning curve than I was willing to invest. In the online documentation for Lucene somewhere it said that if Lucene is too low-level for you then to try Apache Solr, so I looked into that. Solr is based on Lucene and runs on Java, so that means you need to install Apache Tomcat if you haven’t already, which I didn’t. I looked into installing it on my XAMPP installation and realized I would need to either update XAMPP to a version that included it or add it manually. This was turning into a larger project than I wanted it to be even before I could begin to think about the search server itself. So I looked into something else.

Here are some lists of search engine software (search servers):

Sphinx Search caught my eye because it was specifically designed to index MySQL databases, which is what I wanted primarily, although you can index just about anything else too, like XML files and web pages. It was also mentioned as particularly easy to implement.

This is true, although, of course, there are pitfalls, especially since, like most things web server, it is mainly aimed at Linux and not Windows installations. And this is the main reason for this tutorial, to help Windows users install and implement Sphinx Search with as little trouble as possible.

Resources that helped me:

Decide which release you want, generally the latest stable beta release as described here: http://sphinxsearch.com/downloads/

Download “Win32 binaries w/MySQL support” here: http://sphinxsearch.com/downloads/beta/

Unzip the zip file to C:\Sphinx. This directory should now include a few files, such as sphinx.conf.in, and subdirectories, such as api and bin.

Add the subdirectories data and log.

Copy the file C:\Sphinx\sphinx.conf.in to C:\Sphinx\sphinx.conf.

Open C:\Sphinx\sphinx.conf in a text editor.

Change the settings sql_user and sql_pass to match your MySQL installation.

Find and replace all occurrences of @CONFDIR@/ with C:\Sphinx\, at least in the uncommented lines.

Find and replace all occurrences of forward slash (/) with backslash (\) in these same lines. For example @CONFDIR@/data/test1 becomes C:\Sphinx\data\test1.

Leave the source and index settings as they are for now for testing purposes.

If MySQL isn’t running already, start it now.

Start cmd.exe as an administrator (click the Windows Start icon, type cmd in the search box without hitting Enter, right-click "cmd.exe", click “Run as administrator”). On Windows XP this is easier: no need to worry about administrator rights.

Go to your MySQL installation directory, for example enter cd c:\xampp\mysql.

Create the test database and fill it with data by entering:
mysql -u [youruser] -p < c:\sphinx\bin\example.sql (replace [youruser] with your user name).

Enter cd c:\sphinx\bin.

Index the test database by entering indexer.exe --config c:\sphinx\sphinx.conf test1.

Install Sphinx as a service by entering:
searchd.exe --install --config c:\sphinx\bin\sphinx.conf --servicename SphinxSearch

Start the service: Go to Control panel > Administrative tools > Services. Right-click on "SphinxSearch" in the list and click "Start".

Test the search by entering: search.exe --config c:\sphinx\sphinx.conf test. This should result in some matches for the search term “test”.

Copy the Sphinx Search PHP API from C:\Sphinx\api\sphinxapi.php to your site’s HTML directory, for example C:\xampp\htdocs\[yoursite]\sphinxapi.php.

Create a test PHP file with the following contents:



<?php
mysql_connect("localhost", "root", [yourpassword]);
mysql_select_db("test");
require_once('sphinxapi.php');
$s = new SphinxClient;
$s->setServer("127.0.0.1", 9312); // NOT "localhost" under Windows 7!
$s->setMatchMode(SPH_MATCH_EXTENDED2);
$s->SetLimits(0, 25);
$result = $s->Query("test");
if ($result['total'] > 0) {
echo 'Total: ' . $result['total'] . "<br>\n";
echo 'Total Found: ' . $result['total_found'] . "<br>\n";
echo '<table>';
echo '<tr><td>No.</td><td>ID</td><td>Group ID</td><td>Group ID 2</td><td>Date Added</td><td>Title</td><td>Content</td></tr>';
foreach ($result['matches'] as $id => $otherStuff) {
$row = mysql_fetch_array(mysql_query("select * from documents where id = $id"));
extract($row);
++ $no;
echo "<tr><td>$no</td><td>$id</td><td>$group_id</td><td>$group_id2</td><td>$date_added</td><td>$title</td><td>$content</td></tr>";
}
echo '</table>';
} else {
echo 'No results found';
}
?>


Run the PHP file in a browser. You should get a table of results that include the search term “test”.

Now it’s time to RTFM to implement this in your own application. You’ll need to adapt sphinx.conf to index your own MySQL database (http://sphinxsearch.com/docs/2.0.1/conf-reference.html) and adapt the API settings to suit your needs (http://sphinxsearch.com/docs/2.0.1/api-reference.html).

For example, in sphinx.conf you probably want to change the source src1 section to something meaningful. I use [database]_[table], for example source mydb_yourtable, since there can be more than one source section. I figure that avoids confusion, and if I have multiple databases/sources I can simply specify the list of indexes to search in the Query() call, for example:

$cl->Query ("test query", "db1_table1, db2_table3");

Change sql_db to your database.

Change sql_query_info to your table.

You also need to change sql_query to access the MySQL table you want to index. The first field must be a unique, unsigned, positive, integer ID. If your table doesn't have one, you need to add it to the table first. This is how you will later reference the rows found in PHP.

For example:
ALTER TABLE yourtable ADD id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY FIRST;

Adapt or delete the settings sql_attr_uint and sql_attr_timestamp.

If you changed the name of the source section then you need to change the name or comment out the section source src1throttled : src1.

You probably want to also change the index test1 section to something meaningful. I use [database]_[table] as above, for example index mydb_yourtable.

Change the source setting in the index section to the whatever you named the source above, e.g. mydb_yourtable.

You can also change the path directive to something more sensible than test1.

If you want wildcard support, set enable_star = 1. This means that star (*) can be used at the beginning and/or the end of the keyword. The star will match zero or more characters, similar to % in MySQL. A star in the middle of the keyword doesn’t seem to be supported (but see below). Nor is a question mark or underscore to match exactly one character. You also need to set min_infix_len = 1 for example.

If you changed the name of the index section above then you need to change the name in the sections index test1stemmed : test1 and index dist1 or comment them out.

Be aware that max_matches = 1000 or change this value in the searchd section.

Until and unless you decide to use real-time indexes, you may want to comment out all the lines in the index rt section or delete them.

In the searchd section I set preopen_indexes = 0 because otherwise I had problems with delta index updates (see below).

Index this database as we did above for the test source by entering:
indexer.exe --rotate --config c:\sphinx\sphinx.conf --all

The --rotate option means you don’t have to stop and restart the Windows service running the searchd daemon. However, if you change sphinx.conf, you should manually stop and start searchd. The --rotate option should be used only when reindexing. As of version 0.9.9-rc1 this is no longer absolutely necessary; a --rotate should be sufficient. However, it is usually best to avoid ambiguity.

If you’ve gotten this far and are using real data, then indexing will take a few minutes.

If you ever need to remove the service entirely, start cmd.exe (as an administrator) and enter sc delete SphinxSearch. Somewhere along the line the changes I made to sphinx.conf didn’t work till I did this and reinstalled the service.

Adapt your test PHP file from above to access the database and table you just indexed and load it in a browser.

Now that you’ve gotten Sphinx Search to run in your web application, you need to think about how and when to automatically reindex the data. As of this writing there is no way to automatically do a live index update, at least not with wildcard support. However, you can do an incremental update:

Set up sphinx.conf for delta index updates as described in the manual (adapting to your situation): http://sphinxsearch.com/docs/current.html#live-updates

Use the Windows Task Scheduler to index the main source once daily at slack times. I did this:
c:\sphinx\bin\indexer.exe --rotate --config c:\sphinx\sphinx.conf --all
Once daily at 4:00 am.

Use the Windows Task Scheduler to index the delta source every few minutes at peak times. I did this:
c:\sphinx\bin\indexer.exe --rotate --config c:\sphinx\sphinx.conf mydb_yourtable_delta
Every 5 minutes from 6:00 am to 11:00 pm.

Only new records are indexed every 5 minutes. Modified records only get re-indexed once a day.
You might want to put these commands in batch files.
Re-indexing from the Task Scheduler in Windows 7 didn’t work for me, but it did work in Windows XP. See: http://sphinxsearch.com/bugs/view.php?id=860

In PHP you then need to list both the main index and the delta index in the Query()
function, for example:
$cl->Query ("test query", "db1_table1, db1_table1_delta");


Update: You can simulate for example %one%two% in MySQL with *one* << *two* in Sphinx Search. There's no way to simulate an underscore for exactly one character.


Update: *one* << *two* finds one two but not onetwo.

2011-03-13

My list of favorite Android apps

I recently had to hard reset (i.e. factory reset) my Nexus One, meaning all the apps I had installed were gone, so I thought I'd list the apps I had on it or want on it to help me reinstall all of them and for future reference. Admittedly I should have done this before the hard reset since my brain's memory isn't very reliable. But what the hey!

The side effect is that my faithful readers may benefit from my suggestions. I will probably update this post as I see fit.
  • Words with Friends: The Android version of this game is extremely buggy at the moment (it crashes about 6000 times before I manage to finish a turn), but it was the first app I reinstalled so I could continue the games I was playing with my friends. A pleasant side effect of the hard reset was that games that had disappeared (due to bugs in the game I presume) magically reappeared. My opinion is that many of the bugs are due to the profuse use of ads. I would buy the paid version, but there isn't any for Android yet.
  • Scrabble Helper: For amazing your friends on how many obscure words you know in Words With Friends. OK, so it's cheating.
  • Google Sky Map: This is simply awesome! Don't know what that shiny thing in the night sky is? Just start up Google Sky Map, point your phone at it, and you'll amaze your friends with your astounding knowledge of astronomy. ("Oh, that's Venus.") Or point it at the day sky to see what you would see if the sun wasn't blotting everything out.
  • Satellite AR: Google Sky Map only shows you natural objects, but this app shows you satellites, space junk and all kinds of other man-made objects in the sky. It's especially fun to follow the International Space Station, which you can actually see with the naked eye, but who knew that without this app?
  • Iridium Flare Forecast: Satellite AR will show you what's happening right now, but this app will tell you when and where in the sky to look for the next pass of a satellite or the ISS. Now you can plan ahead for when to amaze your friends!
  • Google Goggles: There's not much I can add to what's already been said about this amazing app. Am I using the word "amaze" too much here? Well, I am amazed at everything you can do with these tiny handheld devices.
  • Barcode Scanner: Google Goggles does what this does and more. So why did I add this? I don't know. Go ask your mother.
  • Microsoft Tag Reader: You know those 2D "barcodes" with little black-and-white squares instead of bars that are called QR codes? Those things are great. You can do all kinds of stuff with them after you scan them with Google Goggles. So what does Microsoft do? The same thing they do with all great ideas. They reinvent them so they can brand them with the Microsoft name. I'm surprised they haven't come up with the "Microsoft Wheel" yet. Anyway, in case you see 2D "barcodes" with little colorful triangles (or other shapes) instead of black-and-white squares, they serve the same purpose as QR codes but are called Microsoft tags and you need a separate app to be able to read them. This is it. I only ever needed it to see a video about body painting after seeing a Microsoft tag in a magazine ad for the Sports Illustrated Swimsuit Edition. (By the way, barcodes [with bars instead of squares or triangles] are 1D.)
  • Google Translate: This isn't quite a Star Trek universal translator, but it's getting there! Did I say it's amazing?
  • LEO dictionary: My favorite German-English dictionary. I also use it for German-French, German-Spanish and German-Italian.
  • Amazon Kindle: I didn't consciously choose this over other e-book readers. It's just the first one I happened to install. Still working my way through the free e-books.
  • Cardio Trainer: This is similar to Google's My Tracks, except that you're not limited to exercise where you move from one place to another. I use it mostly at the gym. Another advantage is that it automatically exports to Google Health, which Google's own app doesn't! The free version is adequate for me.
  • Blood Pressure Log: I use this to record and analyze my blood pressure, pulse and weight. It makes great graphs that you can export or send to e-mail addresses. It was worth it to me to make a small donation so it's ad-free. It doesn't export to Google Health yet, but the developer has said he's working on it. I don't know of any Android blood pressure app that exports to Google Health.
  • Catch Notes: This used to be called 3banana, and I use it to take text notes in all kinds of situations. No more searching around for pen and paper. It also syncs automatically to the web version. Its features have grown immensely since I first installed it. I just hope it doesn't get too bloated.
  • D-Measures: I work for a company that makes shower enclosures. With this app you can, for example, take a picture of your bathroom, then add little measurement lines with arrow points. For example, you can add dimensions of where you want your shower, e.g. 90 x 100 cm and 200 cm high. You can also add angles. Then e-mail the result to your bathroom supplier for an offer. The uses are (almost) endless. It costs $1.99 and has a 15-day free trial version.
  • Compass: For when you need a compass. I don't recommend using it while driving on the freeway to figure out if you're going the wrong way.
  • Mein A1: The remarkably featureless app of my mobile phone plan provider. For example, it doesn't tell me how high I've run my roaming fees or warn me if it gets higher than a certain amount.
  • Kik Messenger: Send text messages without paying highway-robbery texting fees. Supposedly similar to BlackBerry Messenger, but I don't have a BlackBerry, so I wouldn't know. The disadvantage is that both partners need Kik Messenger, so if your partner doesn't have a smartphone you're out of luck.
  • Skype: Use your data plan instead of your telephony plan to call someone for times when it would be cheaper that way.
  • Microsoft Tag Reader: You know those 2D "barcodes" with little black and white squares instead of bars that are called QR codes? Those things are great. You can do all kinds of stuff with them. So what does Microsoft do? The same thing they do with all great ideas. They reinvent them so they can brand them with the Microsoft name and hope to make billions of dollars with it. I'm surprised they haven't come up with the "Microsoft Wheel" yet. Anyway, in case you see little 2D "barcodes" with little colorful triangles instead of black and white squares, they serve the same purpose as QR codes but are called Microsoft tags, and you need a separate app to be able to read them. This is it. I only ever needed it to see a video of the history of body painting after seeing a Microsoft tag in a magazine ad for the Sports Illustrated Swimsuit Edition. (OK, I know you can store more information in them because of the colors, but basically it's a reinvention of the wheel.)
  • Labyrinth Lite: One of the few games in my list, I don't play it much, but I like to keep it installed to show people some of the amazing things you can do with a smartphone. Yes, I still have some friends who don't have a smartphone.
  • Lotus Notes Traveler: My work mail, calendar and contacts apps. Unfortunately they don't sync with Google Mail, Calendar and Contacts, but at least I can check both calendars when making a new appointment, and I can answer work e-mails when I'm at a seminar or whatever for work.
  • Calvin and Hobbes: 'Nuff said.
  • Daily Dilbert: 'Nuff said.
  • Dropbox: Syncs your files online and across devices for access anywhere. Also a good way to send someone large files.
  • eBay: Everybody's favorite auction site.
I don't really see much point in installing competing apps to the ones pre-installed on my phone such as: Browser, Calculator, Calendar, Clock, Facebook, Gmail, Gallery, Maps, Navigation, News and Weather, and YouTube. The two exceptions are:
  • Winamp: Since Google Music is still vaporware and even if it was available wouldn't be available outside the US for about umpteen years, I used the Winamp app along with Winamp on my PC to import my iTunes library and sync that with my Nexus One. UPDATE: I would probably prefer Amazon Cloud Player, but it's not available outside the US. Darn!
  • TweetDeck: Lets you manage several Twitter accounts without having to log off and on all the time.
Following are apps that I'd kinda love to have or at least try out but can't because they're not available outside the US. Come on, guys!
  • Google Voice: Competition for Skype.
  • Square: Accept payments from anyone with an e-mail address on your Android device. Not even PayPal can do that, because the payee needs to have a PayPal account. People are much more likely to have an e-mail address than a PayPal account.
  • Amazon MP3: Including Cloud Player
And finally I'm testing the following:
  • Qype: A place rating service that seems to be based in the UK. It doesn't recognize some large cities in the US, for example Fairbanks, AK.
  • Xing: A business-oriented social network sorta like LinkedIn.
I'm looking forward to your comments on the apps above and on your favorite apps. I plan to update mine above as occasions arise.