Introducing the Fluent Query API Part 4 of n: Saving and Deleting Documents

by Dennis 21. August 2012 14:54

Disclaimer: The API presented here is still under development, so there might be changes until the final release. If you have any suggestions or comments post them here, over at Uservoice or drop me a mail!

While SphinxConnector.NET 3.0 is nearing completion and will be released soon, I wanted to write about another nice feature of the fluent API. In the first few posts we’ve focused exclusively on querying. But of course, you’ll also be able to save and delete documents in your (real-time) indexes.

Saving Documents

 

We’ll use the following class as our document model:

public class Book
{
    public int Id { get; set; }
    public string Title { get; set; }
    public string Author { get; set; }
    public decimal Price { get; set; }
    public bool EbookAvailable { get; set; }
    public DateTime ReleaseDate { get; set; }
    public IList<long> Categories { get; set; }
    public int Weight { get; set; }
}

which is based on the following index definition:

index books
{
    type                 = rt
    path                 = books

    rt_field             = title
    rt_attr_string       = title
    rt_field             = author
    rt_attr_string       = author
    rt_attr_float        = price
    rt_attr_timestamp    = releasedate
    rt_attr_uint         = ebookavailable
    rt_attr_multi        = categories

    charset_table        = 0..9, A..Z->a..z, a..z
    charset_type         = utf-8
}

The IFulltextSession interface provides a Save method with two overloads so we can either save a single document or an enumerable of documents.

void Save(object document);

void Save<TDocument>(IEnumerable<TDocument> documents);

Let’s insert two books into our index:

IFulltextStore fulltextStore = new FulltextStore().Initialize();

using (IFulltextSession session = fulltextStore.StartSession())
{
    session.Save(new Book
    {
        Id = 1,
        Author = "George R.R. Martin",
        Title = "A Game of Thrones: A Song of Ice and Fire: Book One",
        EbookAvailable = true,
        Categories = new long[] { 1, 2 },
        Price = 5.60m,
        ReleaseDate = new DateTime(1997, 8, 4)
    });

    session.Save(new Book
    {
        Id = 2,
        Author = "George R.R. Martin",
        Title = "A Clash of Kings: A Song of Ice and Fire: Book Two",
        EbookAvailable = true,
        Categories = new long[] { 1, 2 },
        Price = 7.10m,
        ReleaseDate = new DateTime(2000, 9, 5)
    });

    session.FlushChanges();
}

The above code is pretty straightforward, we’re creating two instances of the Book class and pass them to the Save method. When we’re done, we call FlushChanges to tell SphinxConnector.NET that all pending saves and deletes should be executed.

This is where things get interesting: When SphinxConnector.NET detects that it needs to save more than one document, it inserts them in batches by generating a single REPLACE statement for each batch:

REPLACE INTO `books` (id, title, author, price, ebookavailable, releasedate, categories) 
VALUES (1, 'A Game of Thrones: A Song of Ice and Fire: Book One', 'George R.R. Martin', 5.60, 1,
870645600, (1, 2)), (2, 'A Clash of Kings: A Song of Ice and Fire: Book Two', 'George R.R. Martin', 7.10, 1,
968104800, (1, 2))

This leads to a speed-up by several orders of magnitude compared to inserting documents one by one. The batch size is of course configurable, so you can fine tune it to your workload.

Deleting Documents

 

To delete documents from real-time indexes we’ll use the Delete methods that the IFulltextSession interface provides. We can either pass in one or more id’s of the documents to delete, or an instance of a document to delete:

using (IFulltextSession session = fulltextStore.StartSession())
{
    session.Delete<Book>(1, 2);

    session.FlushChanges();
}

Let’s take a look at the generated SphinxQL:

DELETE FROM `books` WHERE id IN (1, 2)

Again, SphinxConnector.NET takes into account that more than one document should be deleted and generates a single DELETE statement instead of two, thus avoiding unnecessary network round-trips.

Transaction Handling

 

When a call to FlushChanges is made, SphinxConnector.NET executes all saves and deletes within a new transaction for each affected index. The reason for this is that Sphinx limits transactions to a single real-time index. This is something that can get pretty messy when handled manually, but SphinxConnector.NET will take care of that for you.

Using the TransactionScope class in conjunction with a full-text session is also fully supported.

Tags: , , ,