Usage

Installation

To use Featrix, first install the client using pip:

$ pip install featrix-client     # Coming soon.

You’ll also need a Featrix server; you can run the enterprise edition on-site in your environment or use our hosted SaaS.

What’s Included

The featrix-client package includes a few key modules:

networkclient

A FeatrixTransformerClient for accessing a Featrix embedding service.

graphics

A set of functions for plotting embedding similarity.

utils

A set of functions for working with data that we have found to be useful.

Working with Data

1import featrixclient as ft
2import pandas as pd
3df = pd.read_csv(path_to_your_file)

Train a vector space and a model

You can train multiple models on a single vector space.

Check out our live Google Colab demo notebooks <https://featrix.ai/demo> for examples. The general approach is as follows:

 1# Split the data
 2df_train, df_test = train_test_split(df, test_size=0.25)
 3
 4# Connect to the Featrix server. This can be deployed on prem with Docker
 5# or Featrix’s public cloud.
 6featrix = ft.Featrix("http://embedding.featrix.com:8080")
 7
 8# Here we create a new vector space and train it on the data.
 9vector_space_id = featrix.EZ_NewVectorSpace(df_train)
10
11# We can create multiple models within a single vector space.
12# This lets us re-use representations for different predictions
13# without retraining the vector space.
14# Note, too, that you could train the model on a different training
15# set than the vector space, if you want to zero in on something
16# for a specific model.
17model_id = featrix.EZ_NewModel(vector_space_id,
18                               "Target_column",
19                                df_train)
20
21# Run predictions
22result = featrix.EZ_PredictionOnDataFrame(vector_space_id,
23                                          Model_id,
24                                          "Target_column",
25                                          df_test)
26
27# Now result is a list of classifications in the same symbols
28# as the target column

Predicting on a probability distribution

We can specify a few characteristics of an object and ask for the target field probability distribution. For example, in our mortgage loan demo, we might ask “what are the chances someone who is married will be approved for a loan?”

>>> # result_married_only
>>> featrix.EZ_Prediction(vector_space_id, model_id, {"Married": "Yes"})
{'<UNKNOWN>': 0.0011746988166123629, 'N': 0.33159884810447693, 'Y': 0.6672264933586121}

We can pass in multiple criteria:

>>> # result_married_and_not_graduate
>>> featrix.EZ_Prediction(vector_space_id, model_id, {"Education": "Not Graduate", "Married": "Yes"})
{'<UNKNOWN>': 0.003182089189067483, 'N': 0.5865148305892944, 'Y': 0.41030314564704895}

Classifying records

We can determine a category an object belongs to. Typically we’ll pass in a list of objects and get back a vector of which class each object targets. Featrix includes an EZ_PredictionOnDataFrame call to facilitate passing objects in bulk.

The interface is similar to sklearn’s clf.predict() functions. The target column is specified to ensure it is removed from the query dataframe before passing to the model, if it is present.

>>> featrix.EZ_PredictionOnDataFrame(vector_space_id,
                                      model_id,
                                      "Loan_Status",        # target column name
                                      query_df)
 ['Y' 'Y' 'Y' 'Y' 'Y' 'N' 'Y' 'Y' 'N' 'Y' 'Y' 'Y' 'Y' 'Y' 'N' 'Y' 'Y' 'Y'
  'Y' 'Y' 'N' 'Y' 'Y' 'Y' 'Y' 'Y' 'N' 'Y' 'N' 'Y' 'Y' 'N' 'Y' 'Y' 'N' 'N'
  'N' 'Y' 'N' 'Y' 'Y' 'Y' 'N' 'Y' 'Y' 'N' 'Y' 'Y' 'Y' 'Y' 'Y' 'Y' 'Y' 'Y'
  'Y' 'Y' 'N' 'Y' 'Y' 'N' 'Y' 'N' 'Y' 'Y' 'Y' 'Y' 'Y' 'Y' 'Y' 'Y' 'Y' 'Y'
  'Y' 'Y' 'N' 'Y' 'Y' 'Y' 'Y' 'Y' 'Y' 'Y' 'Y' 'Y' 'Y' 'Y' 'Y' 'Y' 'Y' 'Y'
  'Y' 'Y' 'N' 'Y' 'Y' 'N' 'Y' 'Y' 'Y' 'Y' 'Y' 'Y' 'Y' 'N' 'Y' 'Y' 'Y' 'Y'
  'Y' 'Y' 'Y' 'Y' 'N' 'Y' 'Y' 'N' 'Y' 'Y' 'Y' 'Y' 'Y' 'N' 'Y' 'Y' 'Y' 'Y'
  'Y' 'N' 'Y' 'Y' 'Y' 'Y' 'Y' 'Y' 'Y' 'N' 'Y' 'Y' 'Y' 'Y' 'Y' 'Y' 'Y' 'Y'
  'N' 'Y' 'Y' 'Y' 'Y' 'Y' 'Y' 'Y' 'Y' 'Y' 'Y' 'Y' 'Y' 'Y' 'Y' 'N' 'Y' 'Y'
  'Y' 'Y' 'Y' 'Y' 'N' 'Y' 'Y' 'Y' 'Y' 'Y' 'Y' 'Y' 'Y' 'Y' 'Y' 'Y' 'Y' 'Y'
  'Y' 'Y' 'Y' 'Y' 'Y']

Note that we can use the usual sklearn functions to test accuracy, precision, and recall.

>>> from sklearn.metrics import precision_score, recall_score, accuracy_score
>>> result = # query from above
>>> accuracy_score(df_test_loan_status, result)
0.827027027027027
>>> precision_score(df_test_loan_status, result, pos_label="Y")
0.802547770700637
>>> recall_score(df_test_loan_status, result, pos_label="Y")
0.992125984251968

Regression

Prediction on a continuous variable works in the same way as a query on a categorical variable.