Introducing the Fluent Query API Part 4 of n: Saving and Deleting Documents

by Dennis 21. August 2012 14:54

Disclaimer: The API presented here is still under development, so there might be changes until the final release. If you have any suggestions or comments post them here, over at Uservoice or drop me a mail!

While SphinxConnector.NET 3.0 is nearing completion and will be released soon, I wanted to write about another nice feature of the fluent API. In the first few posts we’ve focused exclusively on querying. But of course, you’ll also be able to save and delete documents in your (real-time) indexes.

Saving Documents

 

We’ll use the following class as our document model:

public class Book
{
    public int Id { get; set; }
    public string Title { get; set; }
    public string Author { get; set; }
    public decimal Price { get; set; }
    public bool EbookAvailable { get; set; }
    public DateTime ReleaseDate { get; set; }
    public IList<long> Categories { get; set; }
    public int Weight { get; set; }
}

which is based on the following index definition:

index books
{
    type                 = rt
    path                 = books

    rt_field             = title
    rt_attr_string       = title
    rt_field             = author
    rt_attr_string       = author
    rt_attr_float        = price
    rt_attr_timestamp    = releasedate
    rt_attr_uint         = ebookavailable
    rt_attr_multi        = categories

    charset_table        = 0..9, A..Z->a..z, a..z
    charset_type         = utf-8
}

The IFulltextSession interface provides a Save method with two overloads so we can either save a single document or an enumerable of documents.

void Save(object document);

void Save<TDocument>(IEnumerable<TDocument> documents);

Let’s insert two books into our index:

IFulltextStore fulltextStore = new FulltextStore().Initialize();

using (IFulltextSession session = fulltextStore.StartSession())
{
    session.Save(new Book
    {
        Id = 1,
        Author = "George R.R. Martin",
        Title = "A Game of Thrones: A Song of Ice and Fire: Book One",
        EbookAvailable = true,
        Categories = new long[] { 1, 2 },
        Price = 5.60m,
        ReleaseDate = new DateTime(1997, 8, 4)
    });

    session.Save(new Book
    {
        Id = 2,
        Author = "George R.R. Martin",
        Title = "A Clash of Kings: A Song of Ice and Fire: Book Two",
        EbookAvailable = true,
        Categories = new long[] { 1, 2 },
        Price = 7.10m,
        ReleaseDate = new DateTime(2000, 9, 5)
    });

    session.FlushChanges();
}

The above code is pretty straightforward, we’re creating two instances of the Book class and pass them to the Save method. When we’re done, we call FlushChanges to tell SphinxConnector.NET that all pending saves and deletes should be executed.

This is where things get interesting: When SphinxConnector.NET detects that it needs to save more than one document, it inserts them in batches by generating a single REPLACE statement for each batch:

REPLACE INTO `books` (id, title, author, price, ebookavailable, releasedate, categories) 
VALUES (1, 'A Game of Thrones: A Song of Ice and Fire: Book One', 'George R.R. Martin', 5.60, 1,
870645600, (1, 2)), (2, 'A Clash of Kings: A Song of Ice and Fire: Book Two', 'George R.R. Martin', 7.10, 1,
968104800, (1, 2))

This leads to a speed-up by several orders of magnitude compared to inserting documents one by one. The batch size is of course configurable, so you can fine tune it to your workload.

Deleting Documents

 

To delete documents from real-time indexes we’ll use the Delete methods that the IFulltextSession interface provides. We can either pass in one or more id’s of the documents to delete, or an instance of a document to delete:

using (IFulltextSession session = fulltextStore.StartSession())
{
    session.Delete<Book>(1, 2);

    session.FlushChanges();
}

Let’s take a look at the generated SphinxQL:

DELETE FROM `books` WHERE id IN (1, 2)

Again, SphinxConnector.NET takes into account that more than one document should be deleted and generates a single DELETE statement instead of two, thus avoiding unnecessary network round-trips.

Transaction Handling

 

When a call to FlushChanges is made, SphinxConnector.NET executes all saves and deletes within a new transaction for each affected index. The reason for this is that Sphinx limits transactions to a single real-time index. This is something that can get pretty messy when handled manually, but SphinxConnector.NET will take care of that for you.

Using the TransactionScope class in conjunction with a full-text session is also fully supported.

Tags: , , ,

Breaking Changes in SphinxConnector.NET 3.0

by Dennis 19. July 2012 17:41

After looking into the fluent query API that will be shipping with SphinxConnector.NET 3.0 in the last three posts, today we’ll be looking at a not so pleasant topic: the breaking changes of the next release.

Requirements Changes


The first thing to mention is that SphinxConnector.NET 3.0 needs at least .NET 3.5 to run. If you’ve read the posts about the fluent query API you probably guessed that already, as it makes heavy use of features only available in .NET 3.5 like expression trees. The next big change is that support for Sphinx versions < 2.0.1 has been dropped. The reason for this is mainly that SphinxQL has greatly improved with the V2 release of Sphinx and many of these improvements are being used for the fluent query API. For those of you that are still using an older Sphinx version, SphinxConnector.NET 2.x will continue to be available and also receive bug fixes if necessary. However, if you’ve not yet updated to Sphinx 2.x, now is a a great time!

Removal of obsolete members


All methods and properties that have been marked as obsolete in SphinxConnector.NET V2 have been removed.

New Namespace for the native API


The native API, i.e. everything that revolves around the SphinxClient class has been moved into its own namespace which (surprisingly Winking smile) is called ‘NativeAPI’. With the addition of a new query API this a logical thing to do to keep things organized within the assembly.

A Namespace for Common Types


Types that are used in more than one kind of API have been moved to a namespace named ‘Common’. This applies to classes like SphinxHelper and SphinxException and a few other types that have been added in V3. One could argue that they could have just been left in the root namespace, but IMO tends to lead to clutter especially if more classes get added over time.

The only class that is contained in the root namespace the new SphinxConnectorLicensing class. It has just one method: SetLicense, which should make it pretty clear where the license key belongs. Since the introduction of the SphinxQL API with V2, there has sometimes been confusion about where the license key goes, because it had to be assigned the License property of the SphinxClient class, which is not that obvious if you’re only using SphinxQL. That property is now gone and hopefully any confusion about the license key with it.

And finally, the namespaces have been renamed such that the root namespace now named ‘SphinxConnector’. This also means that the assembly of V3 will be named ‘SphinxConnector.dll’.

Conclusion


While breaking changes are certainly annoying, I think that in this case it’s only half as bad. You have to set your license key only once per application, so that’s just a small change. The namespaces changes should also be easy to do with Visual Studio’s refactoring capabilities and even easier if you are using a tool like ReSharper.

Tags: , , ,

Introducing the Fluent Query API Part 3 of n: Aggregates, Functions and Projections

by Dennis 14. June 2012 06:47

Disclaimer: The API presented here is still under development, so there might be changes until the final release. If you have any suggestions or comments post them here, over at Uservoice or drop me a mail!

In this post we’ll be taking a look at how the new fluent query API handles the creation of aggregate values and the projection of documents into new types. Also, we’ll see how it translates .NET method calls to calls to functions that are supported by Sphinx. 

Projecting Results


For projecting results into a new form, the IFulltextQuery interface provides the Select method. If you’ve ever used LINQ, you’ll probably already know what to do with it Winking smile. It can be used to select only one attribute from the index e.g. the document id (for the declaration of the Product class please refer to the second post of this series):

IList<int> results = fulltextSession.Query<Product>().
                                     Match("a product").
                                     Select(p => p.Id).
                                     Results();

The generated SphinxQL query will then look like this:

SELECT id AS c1 FROM `product` WHERE MATCH('a product')

As you can see, only the id attribute is being retrieved from the index, thus avoiding unnecessary data transmission from Sphinx to the client.

We could also use it to project only the needed attributes for a given use-case into an anonymous type:

var results = fulltextSession.Query<Product>().
                              Match("a product").
                              Select(p => new
                              {
                                  p.Name,
                                  p.Price
                              }).Results();

 

Aggregates


In order to create aggregate values like the sum, the maximum of values etc., the API provides a static class named Projection which contains methods for all supported aggregation operations. For example, for a product search we could get the number of categories that contain matching products and the minimum and maximum prices in each category like this:

var results = fulltextSession.Query<Product>().
                              Match("a product").
                              GroupBy(p => p.CategoryId).
                              Select(p => new
                              {
                                  p.CategoryId,
                                  ProductCount = Projection.Count(),
                                  MinimumPrice = Projection.Min(() => p.Price),
                                  MaximumPrice = Projection.Max(() => p.Price)
                              }).Results();

 

Functions


Sphinx supports quite a few functions that can be used in a query. It ranges from numeric functions like FLOOR, CEIL over date functions like YEAR to comparison functions like IF. SphinxConnector.NET supports these functions by recognizing the corresponding .NET methods and translating them to their Sphinx equivalents. Most numeric functions can used via the Math class provided by .NET. e.g.

var results = fulltextSession.Query<Product>().
                              Select(p => new
                              {
                                  Floor = Math.Floor(p.Price),
                                  Ceiling = Math.Ceiling(p.Price)
                              }).Results();

The date functions can be used via the methods of the DateTime class, IF can be used via the ternary operator, e.g:

var results = fulltextSession.Query<Product>().
                              Select(p => new
                              {
                                  Price = p.CategoryId == 5 ? p.Price * 0.9m : p.Price
                              }).Results();

will be translated to:

SELECT IF(categoryid = 5, price * 0.9, price) AS c1 FROM `product`

For functions that have no corresponding .NET method, SphinxConnector.NET provides the Function class which contains methods for functions like Fibonacci and Geodist. Additionally, there are extension methods for the IN and INTERVAL functions. Here’s an example for getting the number of products in certain price intervals for some categories:

var results = fulltextSession.Query<Product>().
Where(p => p.CategoryId.In(4, 8, 15, 16, 23, 42)). Select(p => new { Count = Projection.Count(), PriceInterval = p.Price.Interval(10, 50, 100, 1000) }). GroupBy(p => p.PriceInterval). Results();

Tags: , , ,

Introducing the Fluent Query API Part 2 of n: A Closer Look at Querying

by Dennis 30. May 2012 07:15

Disclaimer: The API presented here is still under development, so there might be changes until the final release. If you have any suggestions or comments post them here, over at Uservoice or drop me a mail!

In the last post I gave a quick overview about the new fluent query API. In this post we will explore one if the main interface that developers will interact with: the IFulltextQuery<T> interface. This interface provides all the necessary methods for building a query and retrieving the results from the Sphinx server. IFulltextQuery is a generic interface, where the generic type argument is a class that models the document that the Sphinx index contains.

Suppose we an index source defined like this (other fields omitted for brevity):

source product 
{       
    sql_field_string = name 
sql_field_string = description sql_attr_float = price sql_attr_uint = categoryid sql_attr_uint = vendorid sql_attr_float = weight }

We would then define a class called Product like this:

public class Product
{
    public int Id { get; set; }
    public string Name { get; set; }
    public string Description { get; set; }
    public decimal Price { get; set; }
    public int CategoryId { get; set; }
    public int VendorId { get; set; }
    public int Weight { get; set; }
}

Note that we have also added a property named “Weight” to be able to retrieve the weight that Sphinx assigns to a match. We can then start querying the index like this:

FulltextStore fulltextStore = new FulltextStore();

using (IFulltextSession fulltextSession = fulltextStore.StartSession())
{
    var results = fulltextSession.Query<Product>().
                                  Match("a product").
                                  Where(p => p.Price <= 10).
                                  Results();
}    

which will be translated to the following SphinxQL statement:

SELECT id AS c1, name AS c2, description AS c3, price AS c4, categoryid AS c5, 
       vendorid AS c6, weight() AS c7 
FROM product 
WHERE MATCH('a product') AND price <= 10.0
Note that the Product class does not need to be marked with any attributes or have any mappings defined to be used for querying the index. The fluent query API uses conventions to translate class names to index names and property names to attribute names. It comes with a set of default conventions, but you will of course be able to specify your own conventions.

 

Ordering and Grouping

 

The IFulltextQuery interface exposes the following methods for ordering and grouping results:

IFulltextQuery<T> GroupBy<TKey>(Expression<Func<T, TKey>> keySelector);

IFulltextQuery<T> OrderBy<TKey>(Expression<Func<T, TKey>> keySelector);

IFulltextQuery<T> OrderByDescending<TKey>(Expression<Func<T, TKey>> keySelector);

IFulltextQuery<T> ThenBy<TKey>(Expression<Func<T, TKey>> keySelector);

IFulltextQuery<T> ThenByDescending<TKey>(Expression<Func<T, TKey>> keySelector);

IFulltextQuery<T> WithinGroupOrderBy<TKey>(Expression<Func<T, TKey>> keySelector);

IFulltextQuery<T> WithinGroupOrderByDescending<TKey>(Expression<Func<T, TKey>> keySelector);

There should be no big surprises here. In case you are wondering, OrderBy and ThenBy can be used interchangeably, ThenBy is intended to be used to improve the readability of a query when ordering by multiple keys. Additionally, we have WithinGroupOrderBy and WithinGroupOrderByDescending to define the sort order within a group. Here’s an example that uses some of these methods:

using (IFulltextSession fulltextSession = fulltextStore.StartSession())
{
    var results = fulltextSession.Query<Product>().
                                  Match("a product").
                                  GroupBy(x => x.CategoryId).WithinGroupOrderBy(x => x.Price).
                                  OrderBy(x => x.Name).
                                  Results();
}

 

Changing Result Set Sizes

 

For limiting and expanding the size of a query result, the IFulltextQuery interface provides two methods: Take(int count) and Limit(int skip, int take). Both should be pretty much self-explanatory.

 

Setting Query Options

 

For setting the options for a query, the IFulltextQuery interface exposes a method called Options which takes a delegate as an argument, which can be used to make adjustments to the settings. The next example sets the ranker to SPH04, sets a field weight for the description and specifies a value of 50 for the maximum amount of documents to match. We also use the Take method to indicate that we want to retrieve all 50 results, because Sphinx by default limits the result set size to 20.

using (IFulltextSession fulltextSession = fulltextStore.StartSession())
{
    var results = fulltextSession.Query<Product>().
                                  Match("a product").
                                  Options(o => o.Ranker(SphinxRankMode.SPH04).
                                                 FieldWeight(x => x.Description, 1000).
                                                 MaxMatches(50)).
                                  Take(50).
                                  Results();
}

 

Retrieving Query Metadata

 

The last thing we’re going to look at today, is how to retrieve meta data for a query, i.e. information like query execution time and keywords matched. For this, the Results method has an overload that takes an instance of a class named QueryMetaData as an out parameter:

using (IFulltextSession fulltextSession = fulltextStore.StartSession())
{
    QueryMetadata metadata;

    var results = fulltextSession.Query<Product>().
                                  Match("a product").
                                  Results(out metadata);

    Console.WriteLine("{0} {1} {2}", metadata.Total, metadata.TotalFound, metadata.Time);

    foreach (SphinxWordInfo wordInfo in metadata.WordInfo)
    {
        Console.WriteLine("{0} {1} {2}", wordInfo.Word, wordInfo.HitCount, 
                                         wordInfo.MatchingDocumentsCount);
    }
}

That is all for now, in the next post we’ll be looking at aggregates, functions and result set projection.

Tags: , , ,

Introducing the Fluent Query API Part 1 of n: A Quick Overview

by Dennis 4. May 2012 08:48

Disclaimer: The API presented here is still under development, so there might be changes until the final release. If you have any suggestions or comments post them here, over at Uservoice or drop me a mail!

With the upcoming release of SphinxConnector.NET 3.0 there will be a new addition to the API’s provided by SphinxConnector.NET: the fluent query API. This new API let’s you (surprise!) fluently compose your full-text queries based on an object model of the data contained in the index. With this approach building queries is much simpler and much more pleasant than writing SphinxQL by hand. But see for yourself:

using (IFulltextSession fulltextSession = fulltextStore.StartSession())
{
    IList<Product> products = fulltextSession.Query<Product>().
                                              Match("my product query").                   
                                              Where(x => x.VendorId == 2 && x.CategoryId ==5).
                                              OrderBy(x => x.Price).
Take(100). Results(); }

versus:

List<Product> products = new List<Product>();

using (SphinxQLConnection connection = new SphinxQLConnection())
{
    SphinxQLCommand command = connection.CreateCommand(@"SELECT * FROM products     
                                                         WHERE MATCH(@query) 
                                                         AND VendorId = @vendorId 
                                                         AND CategoryId = @categoryId
                                                         ORDER BY Price ASC LIMIT 0, 100");

    command.Parameters.Add("query", "my product query");
    command.Parameters.Add("vendorId", "2");
    command.Parameters.Add("categoryId", "5");

    connection.Open();

    using (SphinxQLDataReader dataReader = command.ExecuteReader())
    {
        while (dataReader.Read())
        {
            Product product = new Product
                                  {
                                      Id = dataReader.GetInt32("Id"),
                                      CategoryId = dataReader.GetInt32("CategoryId"),
                                      VendorId = dataReader.GetInt32("VendorId"),
                                      Name = dataReader.GetString("Name")
                                  };

            products.Add(product);
        }
    }
}

And that’s not even a really complex query. How about this:

using (IFulltextSession fulltextSession = fulltextStore.StartSession())
{
    var results = fulltextSession.Query<Product>().
                                  Match("my product query").
                                  GroupBy(p => p.CategoryId).
                                  WithinGroupOrderByDescending(p => p.Weight).
                                  WithinGroupOrderBy(p => p.Price).
                                  Select(p => new
                                  {
                                      p.Id,
                                      p.Name,
                                      p.Price,
                                      ProductsInCategory = Projection.Count()
                                  }).
                                  OrderByDescending(x => x.ProductsInCategory).
                                  Results();
}

I’ll spare you the SphinxQL equivalent Winking smile. I’ll be posting in more detail about the classes and methods involved in these examples in the course of this series. For now, you can see that we have a class called Products that represents the data we want to query (which can be in one or more index), an interface called IFulltextSession which is provided by a class named FulltextStore. The Query method of the IFulltextSession interface returns an instance of IFulltextQuery<T> which in turn provides the methods to perform full-text queries. 

This was just a quick introduction and basic overview, to let you see what the new query API has to offer. In the next parts, we'll take a more detailed look at each component, so stay tuned!

Tags: , , ,