Optimized Attribute Filtering with SphinxConnector.NET’s Fluent API

by Dennis 1. February 2013 12:11

An interesting article over at the MySQL Performance Blog was recently published about optimizing Sphinx queries that only filter by an attribute (i.e. do not contain a full-text query). I recommend reading the article first and then coming back here, but here’s a quick summary: a Sphinx query that only filters by an attribute may be relatively slow compared to an equivalent query in a regular DBMS. The reason for this is the fact that one cannot create indexes (as in B-tree indexes) for attributes in Sphinx as one would do in a DBMS. So to retrieve the results of such a query, Sphinx has to perform a full-scan of the index which is relatively costly depending on the size of the index.

The article describes a neat trick to get around this limitation: by adding a full-text indexed field for an attribute and querying that, one can achieve a greatly improved query time. In this post I’d like to demonstrate how this technique can be used with SphinxConnector.NET’s fluent API in conjunction with a real-time index.

The index in the articles example contains data about books, so I’ll be using that here as well. These documents have an integer attribute for a user id that we’d like to store as a full-text indexed field. Let’s take a look at what the document model should look like and which additional settings need to be applied.

To add the user id attribute to the full-text index it needs to be converted to a string. We’ll also add a prefix to each value to avoid it being included in the results of a “regular” full-text query. To do this, we add a string property to the document model that returns the converted and prefixed value:

public class CatalogItem
{
    public int Id { get; set; }

    public int UserId { get; set; }

    public string Title { get; set; }

    public string UserIdKey
    {
        get { return "userkey_" + UserId; }
    }
}

As of Version 3.2, SphinxConnector.NET will automatically exclude any read-only property when selecting the results of a query, so no further setup is required here (it will of course still be inserted into the index during a save).

In previous versions of SphinxConnector.NET the UserIdKey property would have to be configured as follows:

fulltextStore.Conventions.IsFulltextFieldOnly = memberInfo => memberInfo.Name == "UserIdKey";

A query that uses the new attribute would then look this:

IList<CatalogItem> results = session.Query<CatalogItem>().
                                     Match("@UserIdKey userkey_42").
                                     ToList();

For the sake of completeness, here’s the corresponding Sphinx configuration:

index catalog
{
    type = rt
    path = catalog
rt_field = title rt_field = useridkey rt_attr_string = title rt_attr_uint = userid }

Tags: , ,

How-to

SphinxConnector.NET 3.2 has been released

by Dennis 30. January 2013 11:50

We're pleased to announce the immediate availability of a new release of SphinxConnector.NET! Among other things, we've been busy to add support for Sphinx 2.1 in the course of which we've made several optimizations that should improve performance and reduce memory usage with SphinxQL and the fluent API.

The latter now properly supports enums and has gotten support for JSON attributes that are going to be introduced with Sphinx 2.1. The methods First() and FirstOrDefault() can now also be executed as futures, and we've added the possibility to perform operations like attaching and flushing indexes to the fluent API.

There are several other additions, improvements, and bugfixes which are listed in the version history.

Tags:

Announcements

A Quick Way to Setup Logging during Development

by Dennis 18. January 2013 09:29

I’ve been asked a few times if there’s a quick way to get logging output from SphinxConnector.NET without setting up a “real” logging framework like NLog. Here’s one: Common.Logging comes with two adapters named TraceLoggerFactoryAdapter and ConsoleOutLoggerFactoryAdapter. The latter (obviously) logs messages to the console, while the former logs messages via .NET’s Trace class. One nice thing about the trace log is that it can be accessed via Visual Studio’s ‘Output’ window (CTRL+ALT+O) if your application is running with a debugger attached (F5).

Here is the relevant code:

[Conditional("DEBUG")]
private static void SetupLogging()
{
    LogManager.Adapter = new TraceLoggerFactoryAdapter
    {
        Level = LogLevel.All,
        ShowLevel = true,
        ShowDateTime = true,
    };
}

I also added a Conditional attribute to the setup method to ensure that it is only being called in a debug configuration.

Tags:

How-to

Using SphinxConnector.NET with ASP.NET MVC

by Dennis 21. December 2012 11:41

As using Sphinx from a web application is probably the most common use case,  I thought I’d post some guidelines and examples on how to use the fluent API in an ASP.NET MVC application with regards to setup and proper handling of IFulltextStore and IFulltextSession. The documentation already mentions that there should (usually) be one instance of the FulltextStore per application and one IFulltextSession per thread/(web-) request. Let’s take a look at a few different approaches to this:

Using Lazy

 

This approach makes use of the Lazy<T> class that was introduced with .NET 4.0. We create a base controller that holds the IFulltextStore instance which will be initialized upon the first access. Lazy<T> will make sure that the FulltextStore is created only once in a thread-safe way.

public abstract class SearchController : Controller
{
    private static readonly Lazy<IFulltextStore> Store = new Lazy<IFulltextStore>(() =>
    {
        IFulltextStore fulltextStore = new FulltextStore().Initialize();
        fulltextStore.ConnectionString.IsThis("pooling=true");

        return fulltextStore;
    });

    protected static IFulltextStore FulltextStore
    {
        get { return Store.Value; }
    }

    protected IFulltextSession FulltextSession { get; private set; }

    protected override void OnActionExecuting(ActionExecutingContext filterContext)
    {
        FulltextSession = FulltextStore.StartSession();
    }

    protected override void OnActionExecuted(ActionExecutedContext filterContext)
    {
        if (filterContext.IsChildAction || FulltextSession == null)
            return;

        using (FulltextSession)
        {
            if (filterContext.Exception != null)
                return;

            FulltextSession.FlushChanges();
        }
    }
}

The IFulltextSession for every request is created in an override of OnActionExecuting by assigning the result of StartSession to the FulltextSession property. This way, every controller that inherits from SearchController automatically gets an open session that is ready for use. In the override of OnActionExecuted we tell the FulltextSession to flush all pending changes. The using statement ensures that it is properly disposed of.

Using an IoC-Container

 

Following is an example installer for Castle Windsor:

public class SphinxConnectorInstaller : IWindsorInstaller
{
    public void Install(IWindsorContainer container, IConfigurationStore store)
    {
        container.Register(Component.For<IFulltextStore>().
                                     Instance(new FulltextStore().Initialize()).
                                     LifestyleSingleton(),
                           Component.For<IFulltextSession>().
                                     UsingFactoryMethod(kernel =>
                                         kernel.Resolve<IFulltextStore>().StartSession()).
                                     LifestylePerWebRequest());
    }
}

In this example, we setup Castle Windsor so that it can create both IFulltextStore and IFulltextSession. If you wanted to create IFulltextSession yourself (by injecting IFulltextStore into your classes and calling StartSession), you could remove the corresponding code from the installer.

We instruct Windsor to use the Singleton lifestyle for IFulltextStore, which means that Windsor will create one instance per container. In fact, Windsor uses Singleton is the default lifestyle, but in cases like this I’d like to make that explicit, so that developers that are not familiar with Windsor immediately see what’s going on. For IFulltextSession we set LifestylePerWebRequest so that Windsor will create an instance for each request; it will also automatically call Dispose at the end of each request, so we don’t have to worry about that. If you wanted Windsor to also call FlushChanges, you could do so with the help of Windsor’s OnDestroy method.

Initialization at Application Startup

 

Like with the first approach, we create a base controller, this time with a static property hat holds the IFulltextStore instance. The instance is initialized in the Global.asax.cs file in Application_Start:

public abstract class SearchController : Controller 
{
    public static IFulltextStore FulltextStore { get; set; }
protected IFulltextSession FulltextSession { get; private set; }
//Overrides of OnActionExecuting and OnActionExecuted omitted }
protected void Application_Start()
{
    AreaRegistration.RegisterAllAreas();

    RegisterGlobalFilters(GlobalFilters.Filters);
    RegisterRoutes(RouteTable.Routes);

    InitFulltextStore();
}

private static void InitFulltextStore()
{
    IFulltextStore fulltextStore = new FulltextStore().Initialize();
    fulltextStore.ConnectionString.IsThis("pooling=true");

    SearchController.FulltextStore = fulltextStore;
}

Tags: , ,

How-to

Importing Data into Sphinx RT-Indexes with SphinxConnector.NET’s Fluent API

by Dennis 19. November 2012 12:04

If you are facing the task of importing data into a Sphinx RT-index, SphinxConnector.NET’s fluent API makes this really easy with just a couple lines of code (the document model class is omitted for brevity):

void Import()
{
    IFulltextStore fulltextStore = new FulltextStore().Initialize();
    fulltextStore.ConnectionString.IsThis("pooling=true");

    int count = 0;

    using (IFulltextSession session = fulltextStore.StartSession())
    {
        foreach (var document in GetDocuments())
        {
            session.Save(document);

            if (++count % fulltextStore.Settings.SaveBatchSize == 0)
                session.FlushChanges();
        }

        session.FlushChanges();   
    }
}   

The important part is the call to FlushChanges each time a batch of documents has been passed to Save. This avoids high memory usage when importing many documents, because SphinxConnector.NET has to keep each document in memory until FlushChanges is called (though for smaller datasets it might be acceptable to flush all changes at the end of the import process).

Not only is this much simpler than writing SphinxQL by hand, it’s also faster because of SphinxConnector.NET’s automatic batching. The default value for SaveBatchSize is 16, which provides good performance, but can of course be adjusted for environments where a higher batch size leads to even more performance.

Tags: , ,

Tutorial | How-to