If you have a MySQL-based website with lots of records (like hundreds of thousands) and run a full-text search on them, you will probably run into performance problems. The first thing most people probably do is use LIKE with the % wildcard. This quickly becomes too slow, of course, because MySQL has to search through the entire text of every record of the table and can’t use any indexes.
The first solution most people think of is MySQL’s own full-text index capability. However, this doesn’t work very well if you run the query on many text fields per record. In my case, the search was even slower.
Sooner or later, you have to think about creating additional full-text indexes. At a seminar once I heard about how powerful the Apache Lucene project is, so that’s the first thing I looked into. However, I quickly found out that it’s very complicated and requires a larger learning curve than I was willing to invest. In the online documentation for Lucene somewhere it said that if Lucene is too low-level for you then to try Apache Solr, so I looked into that. Solr is based on Lucene and runs on Java, so that means you need to install Apache Tomcat if you haven’t already, which I didn’t. I looked into installing it on my XAMPP installation and realized I would need to either update XAMPP to a version that included it or add it manually. This was turning into a larger project than I wanted it to be even before I could begin to think about the search server itself. So I looked into something else.
Here are some lists of search engine software (search servers):
- http://en.wikipedia.org/wiki/Category:Free_search_engine_software
- http://onlamp.com/pub/a/php/2006/02/16/search-engine-showdown.html
Sphinx Search caught my eye because it was specifically designed to index MySQL databases, which is what I wanted primarily, although you can index just about anything else too, like XML files and web pages. It was also mentioned as particularly easy to implement.
This is true, although, of course, there are pitfalls, especially since, like most things web server, it is mainly aimed at Linux and not Windows installations. And this is the main reason for this tutorial, to help Windows users install and implement Sphinx Search with as little trouble as possible.
Resources that helped me:
- http://sphinxsearch.com/docs/2.0.1/
- http://sphinxsearch.com/forum/view.html?id=2972
- http://www.crankberryblog.com/2011/intalling-sphinx-on-wamp-localhost-windows
- http://sphinxsearch.com/forum/view.html?id=4986
Decide which release you want, generally the latest stable beta release as described here: http://sphinxsearch.com/downloads/
Download “Win32 binaries w/MySQL support” here: http://sphinxsearch.com/downloads/beta/
Unzip the zip file to C:\Sphinx. This directory should now include a few files, such as sphinx.conf.in, and subdirectories, such as api and bin.
Add the subdirectories data and log.
Copy the file C:\Sphinx\sphinx.conf.in to C:\Sphinx\sphinx.conf.
Open C:\Sphinx\sphinx.conf in a text editor.
Change the settings sql_user and sql_pass to match your MySQL installation.
Find and replace all occurrences of @CONFDIR@/ with C:\Sphinx\, at least in the uncommented lines.
Find and replace all occurrences of forward slash (/) with backslash (\) in these same lines. For example @CONFDIR@/data/test1 becomes C:\Sphinx\data\test1.
Leave the source and index settings as they are for now for testing purposes.
If MySQL isn’t running already, start it now.
Start cmd.exe as an administrator (click the Windows Start icon, type cmd in the search box without hitting Enter, right-click "cmd.exe", click “Run as administrator”). On Windows XP this is easier: no need to worry about administrator rights.
Go to your MySQL installation directory, for example enter cd c:\xampp\mysql.
Create the test database and fill it with data by entering:
mysql -u [youruser] -p < c:\sphinx\bin\example.sql (replace [youruser] with your user name).
Enter cd c:\sphinx\bin.
Index the test database by entering indexer.exe --config c:\sphinx\sphinx.conf test1.
Install Sphinx as a service by entering:
searchd.exe --install --config c:\sphinx\bin\sphinx.conf --servicename SphinxSearch
Start the service: Go to Control panel > Administrative tools > Services. Right-click on "SphinxSearch" in the list and click "Start".
Test the search by entering: search.exe --config c:\sphinx\sphinx.conf test. This should result in some matches for the search term “test”.
Copy the Sphinx Search PHP API from C:\Sphinx\api\sphinxapi.php to your site’s HTML directory, for example C:\xampp\htdocs\[yoursite]\sphinxapi.php.
Create a test PHP file with the following contents:
<?php
mysql_connect("localhost", "root", [yourpassword]);
mysql_select_db("test");
require_once('sphinxapi.php');
$s = new SphinxClient;
$s->setServer("127.0.0.1", 9312); // NOT "localhost" under Windows 7!
$s->setMatchMode(SPH_MATCH_EXTENDED2);
$s->SetLimits(0, 25);
$result = $s->Query("test");
if ($result['total'] > 0) {
echo 'Total: ' . $result['total'] . "<br>\n";
echo 'Total Found: ' . $result['total_found'] . "<br>\n";
echo '<table>';
echo '<tr><td>No.</td><td>ID</td><td>Group ID</td><td>Group ID 2</td><td>Date Added</td><td>Title</td><td>Content</td></tr>';
foreach ($result['matches'] as $id => $otherStuff) {
$row = mysql_fetch_array(mysql_query("select * from documents where id = $id"));
extract($row);
++ $no;
echo "<tr><td>$no</td><td>$id</td><td>$group_id</td><td>$group_id2</td><td>$date_added</td><td>$title</td><td>$content</td></tr>";
}
echo '</table>';
} else {
echo 'No results found';
}
?>
Run the PHP file in a browser. You should get a table of results that include the search term “test”.
Now it’s time to RTFM to implement this in your own application. You’ll need to adapt sphinx.conf to index your own MySQL database (http://sphinxsearch.com/docs/2.0.1/conf-reference.html) and adapt the API settings to suit your needs (http://sphinxsearch.com/docs/2.0.1/api-reference.html).
For example, in sphinx.conf you probably want to change the source src1 section to something meaningful. I use [database]_[table], for example source mydb_yourtable, since there can be more than one source section. I figure that avoids confusion, and if I have multiple databases/sources I can simply specify the list of indexes to search in the Query() call, for example:
$cl->Query ("test query", "db1_table1, db2_table3");
Change sql_db to your database.
Change sql_query_info to your table.
You also need to change sql_query to access the MySQL table you want to index. The first field must be a unique, unsigned, positive, integer ID. If your table doesn't have one, you need to add it to the table first. This is how you will later reference the rows found in PHP.
For example:
ALTER TABLE yourtable ADD id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY FIRST;
Adapt or delete the settings sql_attr_uint and sql_attr_timestamp.
If you changed the name of the source section then you need to change the name or comment out the section source src1throttled : src1.
You probably want to also change the index test1 section to something meaningful. I use [database]_[table] as above, for example index mydb_yourtable.
Change the source setting in the index section to the whatever you named the source above, e.g. mydb_yourtable.
You can also change the path directive to something more sensible than test1.
If you want wildcard support, set enable_star = 1. This means that star (*) can be used at the beginning and/or the end of the keyword. The star will match zero or more characters, similar to % in MySQL. A star in the middle of the keyword doesn’t seem to be supported (but see below). Nor is a question mark or underscore to match exactly one character. You also need to set min_infix_len = 1 for example.
If you changed the name of the index section above then you need to change the name in the sections index test1stemmed : test1 and index dist1 or comment them out.
Be aware that max_matches = 1000 or change this value in the searchd section.
Until and unless you decide to use real-time indexes, you may want to comment out all the lines in the index rt section or delete them.
In the searchd section I set preopen_indexes = 0 because otherwise I had problems with delta index updates (see below).
Index this database as we did above for the test source by entering:
indexer.exe --rotate --config c:\sphinx\sphinx.conf --all
The --rotate option means you don’t have to stop and restart the Windows service running the searchd daemon. However, if you change sphinx.conf, you should manually stop and start searchd. The --rotate option should be used only when reindexing. As of version 0.9.9-rc1 this is no longer absolutely necessary; a --rotate should be sufficient. However, it is usually best to avoid ambiguity.
If you’ve gotten this far and are using real data, then indexing will take a few minutes.
If you ever need to remove the service entirely, start cmd.exe (as an administrator) and enter sc delete SphinxSearch. Somewhere along the line the changes I made to sphinx.conf didn’t work till I did this and reinstalled the service.
Adapt your test PHP file from above to access the database and table you just indexed and load it in a browser.
Now that you’ve gotten Sphinx Search to run in your web application, you need to think about how and when to automatically reindex the data. As of this writing there is no way to automatically do a live index update, at least not with wildcard support. However, you can do an incremental update:
Set up sphinx.conf for delta index updates as described in the manual (adapting to your situation): http://sphinxsearch.com/docs/current.html#live-updates
Use the Windows Task Scheduler to index the main source once daily at slack times. I did this:
c:\sphinx\bin\indexer.exe --rotate --config c:\sphinx\sphinx.conf --all
Once daily at 4:00 am.
Use the Windows Task Scheduler to index the delta source every few minutes at peak times. I did this:
Once daily at 4:00 am.
Use the Windows Task Scheduler to index the delta source every few minutes at peak times. I did this:
c:\sphinx\bin\indexer.exe --rotate --config c:\sphinx\sphinx.conf mydb_yourtable_delta
Every 5 minutes from 6:00 am to 11:00 pm.
Only new records are indexed every 5 minutes. Modified records only get re-indexed once a day.
Every 5 minutes from 6:00 am to 11:00 pm.
Only new records are indexed every 5 minutes. Modified records only get re-indexed once a day.
You might want to put these commands in batch files.
Re-indexing from the Task Scheduler in Windows 7 didn’t work for me, but it did work in Windows XP. See: http://sphinxsearch.com/bugs/view.php?id=860
In PHP you then need to list both the main index and the delta index in the Query() function, for example:
$cl->Query ("test query", "db1_table1, db1_table1_delta");
Update: You can simulate for example %one%two% in MySQL with *one* << *two* in Sphinx Search. There's no way to simulate an underscore for exactly one character.
Update: *one* << *two* finds one two but not onetwo.
Re-indexing from the Task Scheduler in Windows 7 didn’t work for me, but it did work in Windows XP. See: http://sphinxsearch.com/bugs/view.php?id=860
In PHP you then need to list both the main index and the delta index in the Query() function, for example:
$cl->Query ("test query", "db1_table1, db1_table1_delta");
Update: You can simulate for example %one%two% in MySQL with *one* << *two* in Sphinx Search. There's no way to simulate an underscore for exactly one character.
Update: *one* << *two* finds one two but not onetwo.
thanks a lot.
ReplyDeletehowever test php script shows no results.
can you help?
thanks
Restart your Sphinx Service. If you are using windows then it can be found in taskmanager->services.
Deletehow to add variable on query
ReplyDeleteexample im used without sphinx:
$query = "SELECT
SQL_CALC_FOUND_ROWS
id, title, description, price, region, images, url, created, type, categoryId
FROM
cbmposts
WHERE "
. $this->getCategory($this->category)
. $this->getPrice() // price between
. $this->get_ad_type()//forsale, rent, wanted. wanted to rent
. $this->getRegion($this->region) //entire malaysia, neighbour region, current region
. $this->matchSearch() //AND MATCH(title, description) AGAINST ('$this->query')
. " ORDER BY id DESC";
return $query;
Hello Robbin, thank you for the great tutorial!
ReplyDeleteWhere can I find a patched mysql with sphinxse for windows?
ReplyDeleteThats interesting.
ReplyDeleteJust drop by to say thanks ...
ReplyDeleteI have been reading quite a lot about how to really use Sphinx with PHP-MySQL, so far your article is the easiest to understand.
[Edy]
Hello Robbin Thank You!
ReplyDeleteI'm no expert but I noticed you are querying your MySQL database 25 times inside of a loop. Wouldn't it be far more efficient to get the list of ID's Sphinx returns and use ONE query to the database using WHERE id IN (123,124,125,126,127,128)? 40,000 page views creates 1,000,000 queries, without leaving page 1!!
ReplyDeleteThought I'd share what I do. I do one MySQL query to grab all records in one go using an IN clause.
Delete1. Connect to Sphinx and query it.
$sphinx = new SphinxClient;
$sphinx->setLimits((int)$offset, (int)$limit);
$sphinxResponse = $sphinx->query($query, "main delta");
2. Get ID's from Sphinx and create paramterised IN (?,?,?,?,?) query.
$sphinxIDs = array_keys($sphinxResponse['matches']);
$sqlInQuery = implode(',', array_fill(0, count($sphinxIDs), '?'));
3. Get database records into an associative array
$STH = $DBH->prepare('
SELECT articleID, heading, subheading
FROM articles
WHERE articleID IN ('.$sqlInQuery.')
');
$STH->execute($sphinxIDs);
$rows = array();
while ($row = $STH->fetch()) {
$rows[$row['articleID']] = $row;
}
4. To display results, loop through Sphinx ID's grabbing the matching database row from associative array.
foreach ($sphinxIDs as $c => $id) {
$article = $rows[$id];
echo $article['articleID'] . '
';
echo $article['heading'] . '
';
echo $article['subheading] . '
';
}
This method is very useful, as you don't have to worry about MySQL having to order the results - they'll be in the same order as they were when Sphinx returned them. This will also perform much better than the example in this article. Based on 40,000 page views, this method saves 960,000 database queries.
Can't find search.exe in c:\sphinx\bin Folder pls help me
ReplyDeletethe correct path is C:\sphinx
Deletehelp me.., iam not start running...
ReplyDeleteWindows could not start the SphinxSearch service on Local Computer.
Error 1067: The process terminated unexpectedly.
Open cmd.exe (command prompt) with Administrator Rights.
DeleteI am using Windows XP, and it's not working. Giving me an error "Windows could not start the SphinxSearch service on Local Computer. Error 1067: The process terminated unexpectedly."
ReplyDeleteRun cmd.exe (command prompt) with Administrative Rights.
Deletetry to run command prompt as administrator and after that continue the steps....
ReplyDelete@"Error 1067: The process terminated unexpectedly."
ReplyDeleteDon't type stupid 1:1 was the autor write. The command is wrong:
searchd.exe --install --config c:\sphinx\bin\sphinx.conf --servicename SphinxSearch
in \bin is not a sphinx.conf. You must write
searchd.exe --install --config c:\sphinx\sphinx.conf --servicename SphinxSearch
To remove the wrong service use this page: http://www.howtogeek.com/howto/windows-vista/how-to-delete-a-windows-service-in-vista-or-xp and add it again with right path.
I am getting only two results, any one please help
ReplyDelete