Introducing the Fluent Query API Part 3 of n: Aggregates, Functions and Projections

by Dennis 14. June 2012 06:47

Disclaimer: The API presented here is still under development, so there might be changes until the final release. If you have any suggestions or comments post them here, over at Uservoice or drop me a mail!

In this post we’ll be taking a look at how the new fluent query API handles the creation of aggregate values and the projection of documents into new types. Also, we’ll see how it translates .NET method calls to calls to functions that are supported by Sphinx. 

Projecting Results


For projecting results into a new form, the IFulltextQuery interface provides the Select method. If you’ve ever used LINQ, you’ll probably already know what to do with it Winking smile. It can be used to select only one attribute from the index e.g. the document id (for the declaration of the Product class please refer to the second post of this series):

IList<int> results = fulltextSession.Query<Product>().
                                     Match("a product").
                                     Select(p => p.Id).
                                     Results();

The generated SphinxQL query will then look like this:

SELECT id AS c1 FROM `product` WHERE MATCH('a product')

As you can see, only the id attribute is being retrieved from the index, thus avoiding unnecessary data transmission from Sphinx to the client.

We could also use it to project only the needed attributes for a given use-case into an anonymous type:

var results = fulltextSession.Query<Product>().
                              Match("a product").
                              Select(p => new
                              {
                                  p.Name,
                                  p.Price
                              }).Results();

 

Aggregates


In order to create aggregate values like the sum, the maximum of values etc., the API provides a static class named Projection which contains methods for all supported aggregation operations. For example, for a product search we could get the number of categories that contain matching products and the minimum and maximum prices in each category like this:

var results = fulltextSession.Query<Product>().
                              Match("a product").
                              GroupBy(p => p.CategoryId).
                              Select(p => new
                              {
                                  p.CategoryId,
                                  ProductCount = Projection.Count(),
                                  MinimumPrice = Projection.Min(() => p.Price),
                                  MaximumPrice = Projection.Max(() => p.Price)
                              }).Results();

 

Functions


Sphinx supports quite a few functions that can be used in a query. It ranges from numeric functions like FLOOR, CEIL over date functions like YEAR to comparison functions like IF. SphinxConnector.NET supports these functions by recognizing the corresponding .NET methods and translating them to their Sphinx equivalents. Most numeric functions can used via the Math class provided by .NET. e.g.

var results = fulltextSession.Query<Product>().
                              Select(p => new
                              {
                                  Floor = Math.Floor(p.Price),
                                  Ceiling = Math.Ceiling(p.Price)
                              }).Results();

The date functions can be used via the methods of the DateTime class, IF can be used via the ternary operator, e.g:

var results = fulltextSession.Query<Product>().
                              Select(p => new
                              {
                                  Price = p.CategoryId == 5 ? p.Price * 0.9m : p.Price
                              }).Results();

will be translated to:

SELECT IF(categoryid = 5, price * 0.9, price) AS c1 FROM `product`

For functions that have no corresponding .NET method, SphinxConnector.NET provides the Function class which contains methods for functions like Fibonacci and Geodist. Additionally, there are extension methods for the IN and INTERVAL functions. Here’s an example for getting the number of products in certain price intervals for some categories:

var results = fulltextSession.Query<Product>().
Where(p => p.CategoryId.In(4, 8, 15, 16, 23, 42)). Select(p => new { Count = Projection.Count(), PriceInterval = p.Price.Interval(10, 50, 100, 1000) }). GroupBy(p => p.PriceInterval). Results();

Tags: , , ,

Introducing the Fluent Query API Part 2 of n: A Closer Look at Querying

by Dennis 30. May 2012 07:15

Disclaimer: The API presented here is still under development, so there might be changes until the final release. If you have any suggestions or comments post them here, over at Uservoice or drop me a mail!

In the last post I gave a quick overview about the new fluent query API. In this post we will explore one if the main interface that developers will interact with: the IFulltextQuery<T> interface. This interface provides all the necessary methods for building a query and retrieving the results from the Sphinx server. IFulltextQuery is a generic interface, where the generic type argument is a class that models the document that the Sphinx index contains.

Suppose we an index source defined like this (other fields omitted for brevity):

source product 
{       
    sql_field_string = name 
sql_field_string = description sql_attr_float = price sql_attr_uint = categoryid sql_attr_uint = vendorid sql_attr_float = weight }

We would then define a class called Product like this:

public class Product
{
    public int Id { get; set; }
    public string Name { get; set; }
    public string Description { get; set; }
    public decimal Price { get; set; }
    public int CategoryId { get; set; }
    public int VendorId { get; set; }
    public int Weight { get; set; }
}

Note that we have also added a property named “Weight” to be able to retrieve the weight that Sphinx assigns to a match. We can then start querying the index like this:

FulltextStore fulltextStore = new FulltextStore();

using (IFulltextSession fulltextSession = fulltextStore.StartSession())
{
    var results = fulltextSession.Query<Product>().
                                  Match("a product").
                                  Where(p => p.Price <= 10).
                                  Results();
}    

which will be translated to the following SphinxQL statement:

SELECT id AS c1, name AS c2, description AS c3, price AS c4, categoryid AS c5, 
       vendorid AS c6, weight() AS c7 
FROM product 
WHERE MATCH('a product') AND price <= 10.0
Note that the Product class does not need to be marked with any attributes or have any mappings defined to be used for querying the index. The fluent query API uses conventions to translate class names to index names and property names to attribute names. It comes with a set of default conventions, but you will of course be able to specify your own conventions.

 

Ordering and Grouping

 

The IFulltextQuery interface exposes the following methods for ordering and grouping results:

IFulltextQuery<T> GroupBy<TKey>(Expression<Func<T, TKey>> keySelector);

IFulltextQuery<T> OrderBy<TKey>(Expression<Func<T, TKey>> keySelector);

IFulltextQuery<T> OrderByDescending<TKey>(Expression<Func<T, TKey>> keySelector);

IFulltextQuery<T> ThenBy<TKey>(Expression<Func<T, TKey>> keySelector);

IFulltextQuery<T> ThenByDescending<TKey>(Expression<Func<T, TKey>> keySelector);

IFulltextQuery<T> WithinGroupOrderBy<TKey>(Expression<Func<T, TKey>> keySelector);

IFulltextQuery<T> WithinGroupOrderByDescending<TKey>(Expression<Func<T, TKey>> keySelector);

There should be no big surprises here. In case you are wondering, OrderBy and ThenBy can be used interchangeably, ThenBy is intended to be used to improve the readability of a query when ordering by multiple keys. Additionally, we have WithinGroupOrderBy and WithinGroupOrderByDescending to define the sort order within a group. Here’s an example that uses some of these methods:

using (IFulltextSession fulltextSession = fulltextStore.StartSession())
{
    var results = fulltextSession.Query<Product>().
                                  Match("a product").
                                  GroupBy(x => x.CategoryId).WithinGroupOrderBy(x => x.Price).
                                  OrderBy(x => x.Name).
                                  Results();
}

 

Changing Result Set Sizes

 

For limiting and expanding the size of a query result, the IFulltextQuery interface provides two methods: Take(int count) and Limit(int skip, int take). Both should be pretty much self-explanatory.

 

Setting Query Options

 

For setting the options for a query, the IFulltextQuery interface exposes a method called Options which takes a delegate as an argument, which can be used to make adjustments to the settings. The next example sets the ranker to SPH04, sets a field weight for the description and specifies a value of 50 for the maximum amount of documents to match. We also use the Take method to indicate that we want to retrieve all 50 results, because Sphinx by default limits the result set size to 20.

using (IFulltextSession fulltextSession = fulltextStore.StartSession())
{
    var results = fulltextSession.Query<Product>().
                                  Match("a product").
                                  Options(o => o.Ranker(SphinxRankMode.SPH04).
                                                 FieldWeight(x => x.Description, 1000).
                                                 MaxMatches(50)).
                                  Take(50).
                                  Results();
}

 

Retrieving Query Metadata

 

The last thing we’re going to look at today, is how to retrieve meta data for a query, i.e. information like query execution time and keywords matched. For this, the Results method has an overload that takes an instance of a class named QueryMetaData as an out parameter:

using (IFulltextSession fulltextSession = fulltextStore.StartSession())
{
    QueryMetadata metadata;

    var results = fulltextSession.Query<Product>().
                                  Match("a product").
                                  Results(out metadata);

    Console.WriteLine("{0} {1} {2}", metadata.Total, metadata.TotalFound, metadata.Time);

    foreach (SphinxWordInfo wordInfo in metadata.WordInfo)
    {
        Console.WriteLine("{0} {1} {2}", wordInfo.Word, wordInfo.HitCount, 
                                         wordInfo.MatchingDocumentsCount);
    }
}

That is all for now, in the next post we’ll be looking at aggregates, functions and result set projection.

Tags: , , ,

Introducing the Fluent Query API Part 1 of n: A Quick Overview

by Dennis 4. May 2012 08:48

Disclaimer: The API presented here is still under development, so there might be changes until the final release. If you have any suggestions or comments post them here, over at Uservoice or drop me a mail!

With the upcoming release of SphinxConnector.NET 3.0 there will be a new addition to the API’s provided by SphinxConnector.NET: the fluent query API. This new API let’s you (surprise!) fluently compose your full-text queries based on an object model of the data contained in the index. With this approach building queries is much simpler and much more pleasant than writing SphinxQL by hand. But see for yourself:

using (IFulltextSession fulltextSession = fulltextStore.StartSession())
{
    IList<Product> products = fulltextSession.Query<Product>().
                                              Match("my product query").                   
                                              Where(x => x.VendorId == 2 && x.CategoryId ==5).
                                              OrderBy(x => x.Price).
Take(100). Results(); }

versus:

List<Product> products = new List<Product>();

using (SphinxQLConnection connection = new SphinxQLConnection())
{
    SphinxQLCommand command = connection.CreateCommand(@"SELECT * FROM products     
                                                         WHERE MATCH(@query) 
                                                         AND VendorId = @vendorId 
                                                         AND CategoryId = @categoryId
                                                         ORDER BY Price ASC LIMIT 0, 100");

    command.Parameters.Add("query", "my product query");
    command.Parameters.Add("vendorId", "2");
    command.Parameters.Add("categoryId", "5");

    connection.Open();

    using (SphinxQLDataReader dataReader = command.ExecuteReader())
    {
        while (dataReader.Read())
        {
            Product product = new Product
                                  {
                                      Id = dataReader.GetInt32("Id"),
                                      CategoryId = dataReader.GetInt32("CategoryId"),
                                      VendorId = dataReader.GetInt32("VendorId"),
                                      Name = dataReader.GetString("Name")
                                  };

            products.Add(product);
        }
    }
}

And that’s not even a really complex query. How about this:

using (IFulltextSession fulltextSession = fulltextStore.StartSession())
{
    var results = fulltextSession.Query<Product>().
                                  Match("my product query").
                                  GroupBy(p => p.CategoryId).
                                  WithinGroupOrderByDescending(p => p.Weight).
                                  WithinGroupOrderBy(p => p.Price).
                                  Select(p => new
                                  {
                                      p.Id,
                                      p.Name,
                                      p.Price,
                                      ProductsInCategory = Projection.Count()
                                  }).
                                  OrderByDescending(x => x.ProductsInCategory).
                                  Results();
}

I’ll spare you the SphinxQL equivalent Winking smile. I’ll be posting in more detail about the classes and methods involved in these examples in the course of this series. For now, you can see that we have a class called Products that represents the data we want to query (which can be in one or more index), an interface called IFulltextSession which is provided by a class named FulltextStore. The Query method of the IFulltextSession interface returns an instance of IFulltextQuery<T> which in turn provides the methods to perform full-text queries. 

This was just a quick introduction and basic overview, to let you see what the new query API has to offer. In the next parts, we'll take a more detailed look at each component, so stay tuned!

Tags: , , ,