Importing Data into Sphinx RT-Indexes with SphinxConnector.NET’s Fluent API

by Dennis 19. November 2012 12:04

If you are facing the task of importing data into a Sphinx RT-index, SphinxConnector.NET’s fluent API makes this really easy with just a couple lines of code (the document model class is omitted for brevity):

void Import()
{
    IFulltextStore fulltextStore = new FulltextStore().Initialize();
    fulltextStore.ConnectionString.IsThis("pooling=true");

    int count = 0;

    using (IFulltextSession session = fulltextStore.StartSession())
    {
        foreach (var document in GetDocuments())
        {
            session.Save(document);

            if (++count % fulltextStore.Settings.SaveBatchSize == 0)
                session.FlushChanges();
        }

        session.FlushChanges();   
    }
}   

The important part is the call to FlushChanges each time a batch of documents has been passed to Save. This avoids high memory usage when importing many documents, because SphinxConnector.NET has to keep each document in memory until FlushChanges is called (though for smaller datasets it might be acceptable to flush all changes at the end of the import process).

Not only is this much simpler than writing SphinxQL by hand, it’s also faster because of SphinxConnector.NET’s automatic batching. The default value for SaveBatchSize is 16, which provides good performance, but can of course be adjusted for environments where a higher batch size leads to even more performance.

Tags: , ,

Tutorial | How-to

SphinxConnector.NET 3.0.4 released

by Dennis 11. September 2012 10:13

This is just a small maintenance release that contains a few bugfixes. A list of resolved issues is available in the version history. NuGet users can update to the latest version via the package manager, a ZIP package can be downloaded from the download page.

Tags: , , ,

Announcements

SphinxConnector.NET 3.0 has been released

by Dennis 3. September 2012 10:03

We are pleased to announce that the new major version of SphinxConnector.NET is now available for download!

Those of you who have been following the blog already know about the big new feature coming with this release: the fluent query API. The fluent API provides you with a LINQ-like query API to design your full-text queries. It operates directly on your document models and also lets you comfortably save and delete documents from real-time indexes.

A description with much more details is available on the features page.

Another highlight of this release is the newly added support for the Mono runtime. Additionally, we've upgraded Common.Logging to version 2, which provides support for recent releases of the supported logging frameworks. We've also added support for running SphinxConnector.NET in medium-trust environments. There are a bunch of other improvements which are listed in the version history.

SphinxConnector.NET is now available as a NuGet package, which we know many of you have been waiting for!

Licensing and Upgrading

With the new release we're switching to a subscription based licensing system. All new purchases and upgrades come with a 1 year upgrade subscription which gives you access to all major and minor releases made during the subscription period. At the end of the subscription period you can renew your license for just 40% of the then current price.

If you bought your license in 2012, you will receive SphinxConnector.NET 3.0 and all other releases made this year for free! Afterwards you can renew your licenses at the conditions outlined above.

If you bought your license before 2012, you can also renew your license for just 40% of the current price!

We are also introducing a new license type, the 'Large Team License' for up to eight developers, to make up for the fact that we had to raise the price for the site license quite a bit more than we wished. If you have purchased a Site License you can downgrade to a Large Team License if you're eligible.

You can now also purchase a premium support subscription along with your license or license renewal. All details can be found on our purchase page.

If you would like to send us feedback about the new version, you can use the contact form or send us an e-mail to contact@sphinxconnector.net.

Tags: , , ,

Announcements

Introducing the Fluent Query API Part 4 of n: Saving and Deleting Documents

by Dennis 21. August 2012 14:54

Disclaimer: The API presented here is still under development, so there might be changes until the final release. If you have any suggestions or comments post them here, over at Uservoice or drop me a mail!

While SphinxConnector.NET 3.0 is nearing completion and will be released soon, I wanted to write about another nice feature of the fluent API. In the first few posts we’ve focused exclusively on querying. But of course, you’ll also be able to save and delete documents in your (real-time) indexes.

Saving Documents

 

We’ll use the following class as our document model:

public class Book
{
    public int Id { get; set; }
    public string Title { get; set; }
    public string Author { get; set; }
    public decimal Price { get; set; }
    public bool EbookAvailable { get; set; }
    public DateTime ReleaseDate { get; set; }
    public IList<long> Categories { get; set; }
    public int Weight { get; set; }
}

which is based on the following index definition:

index books
{
    type                 = rt
    path                 = books

    rt_field             = title
    rt_attr_string       = title
    rt_field             = author
    rt_attr_string       = author
    rt_attr_float        = price
    rt_attr_timestamp    = releasedate
    rt_attr_uint         = ebookavailable
    rt_attr_multi        = categories

    charset_table        = 0..9, A..Z->a..z, a..z
    charset_type         = utf-8
}

The IFulltextSession interface provides a Save method with two overloads so we can either save a single document or an enumerable of documents.

void Save(object document);

void Save<TDocument>(IEnumerable<TDocument> documents);

Let’s insert two books into our index:

IFulltextStore fulltextStore = new FulltextStore().Initialize();

using (IFulltextSession session = fulltextStore.StartSession())
{
    session.Save(new Book
    {
        Id = 1,
        Author = "George R.R. Martin",
        Title = "A Game of Thrones: A Song of Ice and Fire: Book One",
        EbookAvailable = true,
        Categories = new long[] { 1, 2 },
        Price = 5.60m,
        ReleaseDate = new DateTime(1997, 8, 4)
    });

    session.Save(new Book
    {
        Id = 2,
        Author = "George R.R. Martin",
        Title = "A Clash of Kings: A Song of Ice and Fire: Book Two",
        EbookAvailable = true,
        Categories = new long[] { 1, 2 },
        Price = 7.10m,
        ReleaseDate = new DateTime(2000, 9, 5)
    });

    session.FlushChanges();
}

The above code is pretty straightforward, we’re creating two instances of the Book class and pass them to the Save method. When we’re done, we call FlushChanges to tell SphinxConnector.NET that all pending saves and deletes should be executed.

This is where things get interesting: When SphinxConnector.NET detects that it needs to save more than one document, it inserts them in batches by generating a single REPLACE statement for each batch:

REPLACE INTO `books` (id, title, author, price, ebookavailable, releasedate, categories) 
VALUES (1, 'A Game of Thrones: A Song of Ice and Fire: Book One', 'George R.R. Martin', 5.60, 1,
870645600, (1, 2)), (2, 'A Clash of Kings: A Song of Ice and Fire: Book Two', 'George R.R. Martin', 7.10, 1,
968104800, (1, 2))

This leads to a speed-up by several orders of magnitude compared to inserting documents one by one. The batch size is of course configurable, so you can fine tune it to your workload.

Deleting Documents

 

To delete documents from real-time indexes we’ll use the Delete methods that the IFulltextSession interface provides. We can either pass in one or more id’s of the documents to delete, or an instance of a document to delete:

using (IFulltextSession session = fulltextStore.StartSession())
{
    session.Delete<Book>(1, 2);

    session.FlushChanges();
}

Let’s take a look at the generated SphinxQL:

DELETE FROM `books` WHERE id IN (1, 2)

Again, SphinxConnector.NET takes into account that more than one document should be deleted and generates a single DELETE statement instead of two, thus avoiding unnecessary network round-trips.

Transaction Handling

 

When a call to FlushChanges is made, SphinxConnector.NET executes all saves and deletes within a new transaction for each affected index. The reason for this is that Sphinx limits transactions to a single real-time index. This is something that can get pretty messy when handled manually, but SphinxConnector.NET will take care of that for you.

Using the TransactionScope class in conjunction with a full-text session is also fully supported.

Tags: , , ,

Breaking Changes in SphinxConnector.NET 3.0

by Dennis 19. July 2012 17:41

After looking into the fluent query API that will be shipping with SphinxConnector.NET 3.0 in the last three posts, today we’ll be looking at a not so pleasant topic: the breaking changes of the next release.

Requirements Changes


The first thing to mention is that SphinxConnector.NET 3.0 needs at least .NET 3.5 to run. If you’ve read the posts about the fluent query API you probably guessed that already, as it makes heavy use of features only available in .NET 3.5 like expression trees. The next big change is that support for Sphinx versions < 2.0.1 has been dropped. The reason for this is mainly that SphinxQL has greatly improved with the V2 release of Sphinx and many of these improvements are being used for the fluent query API. For those of you that are still using an older Sphinx version, SphinxConnector.NET 2.x will continue to be available and also receive bug fixes if necessary. However, if you’ve not yet updated to Sphinx 2.x, now is a a great time!

Removal of obsolete members


All methods and properties that have been marked as obsolete in SphinxConnector.NET V2 have been removed.

New Namespace for the native API


The native API, i.e. everything that revolves around the SphinxClient class has been moved into its own namespace which (surprisingly Winking smile) is called ‘NativeAPI’. With the addition of a new query API this a logical thing to do to keep things organized within the assembly.

A Namespace for Common Types


Types that are used in more than one kind of API have been moved to a namespace named ‘Common’. This applies to classes like SphinxHelper and SphinxException and a few other types that have been added in V3. One could argue that they could have just been left in the root namespace, but IMO tends to lead to clutter especially if more classes get added over time.

The only class that is contained in the root namespace the new SphinxConnectorLicensing class. It has just one method: SetLicense, which should make it pretty clear where the license key belongs. Since the introduction of the SphinxQL API with V2, there has sometimes been confusion about where the license key goes, because it had to be assigned the License property of the SphinxClient class, which is not that obvious if you’re only using SphinxQL. That property is now gone and hopefully any confusion about the license key with it.

And finally, the namespaces have been renamed such that the root namespace now named ‘SphinxConnector’. This also means that the assembly of V3 will be named ‘SphinxConnector.dll’.

Conclusion


While breaking changes are certainly annoying, I think that in this case it’s only half as bad. You have to set your license key only once per application, so that’s just a small change. The namespaces changes should also be easy to do with Visual Studio’s refactoring capabilities and even easier if you are using a tool like ReSharper.

Tags: , , ,