Introduction to Sphinx Real-Time Indexes
In this tutorial we'll look at the new real-time index type that ships with Sphinx 1.10.1 and setup an index for parts of the Stackoverflow data dump (the postings made by Stackoverflow users to be precise) which the Stackoverflow team kindly provides under a Creative Commons license. We will then import some data into the index with SphinxConnector.NET. Note, that you do not need to download the complete Stackoverflow dump for this tutorial! A small sample containing about 10,000 posts is included in the download.
Why Real-Time Indexes?
When using real-time indexes, it is not necessary anymore to run the Sphinx indexing process which is responsible for retrieving data from a data source (database or xml files) and creating the index. Instead, you insert, update, and delete your data with an SQL-like query language dubbed SphinxQL, just as you would do in a "regular" database.
Prerequisites
As stated above, real-time indexes are supported from version 1.10.1. The current stable release can be downloaded from here. Additionally, you'll need to download and install SphinxConnector.NET.
Index Configuration
Important! If you make any changes to the schema of an existing RT-index during development, e.g. you add or remove an attribute, you need to delete the index files for Sphinx to recognize theses changes.
As with "regular" indexes, a real-time index needs to be setup in the Sphinx config file. One major difference is that we obviously don't need a source section since we insert our data via SphinxQL. Below is the definition for an index that will hold the data for posting data from the Stackoverflow data dump. Note, that we don't have to define a column for the id, Sphinx adds one automatically:
index so_posts { type = rt path = so_posts rt_field = title rt_field = body rt_field = tags rt_attr_uint = post_type charset_type = utf-8 }
To be able to use SphinxQL, we also need to enable Sphinx' MySQL protocol support, because that is what Sphinx uses to communicate with clients that want to issue SphinxQL commands. We do this by simply adding the following line to the searchd config section: listen = localhost:9306:mysql41
With rt_field, we declare a full-text field, i.e. when we perform a query Sphinx will look for matches in these field(s). In our case we define one field for the title of a post, one for the posts content (body) and one for the posts tags. Additionally, we create an attribute for the post type with rt_attr_uint.
Filling the Index with Data
To insert data into our index, we'll use the classes from the SphinxQL namespace in SphinxConnector.NET. We'll only show parts of the complete example here, the code to read the data from the XML file is omitted for brevity. The complete example, including a small part of the StackOverflow data is available here.
using (SphinxQLConnection connection = new SphinxQLConnection()) { SphinxQLCommand command = connection.CreateCommand(); command.CommandText = @"INSERT INTO so_posts (id, title, body, tags, post_type) VALUES (@postId, @title, @body, @tags, @post_type)"; command.Parameters.Add("postId", 0, SphinxType.UInt); command.Parameters.Add("title", String.Empty, SphinxType.String); command.Parameters.Add("body", String.Empty, SphinxType.String); command.Parameters.Add("tags", String.Empty, SphinxType.String); command.Parameters.Add("post_type", String.Empty, SphinxType.UInt); connection.Open(); foreach (var post in posts) { command.Parameters["postId"].Value = post.Id; command.Parameters["title"].Value = post.Title; command.Parameters["body"].Value = post.Body; command.Parameters["tags"].Value = post.Tags; command.Parameters["post_type"].Value = post.PostTypeId; command.ExecuteNonQuery(); } }
The code to insert our data in the index is straightforward: we create an instance of the SphinxQLConnection class and a SphinxQLCommand associated with the connection. We then specify our insert command text and create parameters for each column. By using parameters, SphinxConnector.NET will make sure that the strings we are inserting are correctly escaped. Lastly, we open the connection and iterate over the rows to insert.
Updating and Deleting Data
For updating a real-time index SphinxQL provides the REPLACE command and for deleting, no surprises here, the DELETE command. The syntax of the REPLACE command is analogous to the INSERT command which is why we won't show it here. If we wanted to delete a post from the index our statement would look like this:
using (SphinxQLConnection connection = new SphinxQLConnection()) { SphinxQLCommand command = connection.CreateCommand(); command.CommandText = "DELETE FROM so_posts WHERE id = @postId"; command.Parameters.Add("postId", 10); connection.Open(); command.ExecuteNonQuery(); }
Automatic Statement Generation With SphinxConnector.NET
If you don't feel like writing all that SphinxQL code yourself and rather have SphinxConnector.NET do it for you, have a look at SphinxConnector.NET's fluent API. It is able to automatically generate statements for saving/updating and deleting documents in real-time indexes. There's also a series (part 1, part 2, part 3) in our blog, where we build a complete ASP.NET MVC site based on the Stack Overflow data, which we used in this tutorial.
Want some extra help?
No problem, we offer consulting and custom development for Sphinx and .NET! Interested?To get in touch drop us a line!
Sample Download