Usage

Installation

To use Featrix, first install the client using pip:

$ pip install featrixclient     # or pip3 install featrixclient

You’ll also need a Featrix server; you can run the enterprise edition on-site in your environment or use our hosted SaaS.

What’s Included

The featrix-client package includes a few key classes:

Featrix

A client object for accessing the Featrix service.

FeatrixUpload

Represents a data set uploaded to Featrix. This can be a dump of a table, a CSV file, etc.

FeatrixProject

Represents a Featrix project: upload associations, embedding spaces, and neural functions live in a project. You can get an existing project from the Featrix client object’s get_project_by_id() call.

FeatrixEmbeddingSpace

Represents a Featrix embedding space. Create with project.create_embedding_space()

FeatrixNeuralFunction

Represents a Featrix neural function. Create with embedding_space.create_neural_function() Run the function with the predict() method on new data.

Working with Data

1import featrixclient as ft                # pip3 install featrixclient
2import pandas as pd
3df = pd.read_csv(path_to_your_file)

Train an embedding space and a model

You can train multiple models on a single embedding space.

Check out our live Google Colab demo notebooks <https://featrix.ai/demo> for examples. The general approach is as follows:

 1# Split the data
 2df_train, df_test = train_test_split(df, test_size=0.25)
 3
 4# Connect to the Featrix service with your API key; create one at https://app.featrix.com/
 5featrix = ft.new_client(client_id=FEATRIX_CLIENT_ID,
 6                        client_secret=FEATRIX_CLIENT_SECRET)
 7
 8# Here we assume that we have used the GUI to (1) create a demo project called 'DemoProject' and (2) uploaded a training csv file.
 9# If you need a train.csv file, we keep one handy at https://bits.featrix.com/demo-data/github.com-anujtiwari21/train.csv
10demoProject = featrix.get_project_by_name("DemoProject")
11
12# Here we create a new embedding space (foundational model) and train it on the data that we have uploaded into 'demoProject'
13embeddingSpace = demoProject.create_embedding_space(wait_for_completion=True)
14
15# We can create multiple neural functions (predictive models) within an embedding space.
16# This lets us re-use representations for different predictions without retraining the embedding space.
17# Note, too, that you could train the model on a different training set than the embedding space, if you want to zero in on something
18# for a specific model.
19nf = embeddingSpace.create_neural_function(target_column='Load_Status',
20                                           wait_for_completion=True)
21
22# Run predictions
23result = nf.predict(df_test)
24
25# Now result is a list of classifications in the same symbols as the target column

Predicting on a probability distribution

We can specify a few characteristics of an object and ask for the target field probability distribution. For example, in our mortgage loan demo, we might ask “what are the chances someone who is married will be approved for a loan?”

>>> # result_married_only
>>> nf.predict({"Married": "Yes"})
{'<UNKNOWN>': 0.0011746988166123629, 'N': 0.33159884810447693, 'Y': 0.6672264933586121}

We can pass in multiple criteria:

>>> # result_married_and_not_graduate
>>> nf.predict({"Education": "Not Graduate", "Married": "Yes"})
{'<UNKNOWN>': 0.003182089189067483, 'N': 0.5865148305892944, 'Y': 0.41030314564704895}

What is <UNKNOWN>?

Featrix uses a built-in symbol represented in the classification predictions as the string ‘<UNKNOWN>’. This lets Featrix inform your application of the probability that there’s not enough information in the training data to make the prediction, e.g. because of distribution shift. This also adds a layer of safety to Featrix because it helps you avoid acting on over-confident predictions unsupported by data. This can also be used to determine shifts over time.

Classifying records

We can determine a category an object belongs to. Typically we’ll pass in a list of objects and get back a vector of which class each object targets. Featrix includes an EZ_PredictionOnDataFrame call to facilitate passing objects in bulk.

The interface is similar to sklearn’s clf.predict() functions. The target column is specified to ensure it is removed from the query dataframe before passing to the model, if it is present.

>>> nf.predict(query_df)
 ['Y' 'Y' 'Y' 'Y' 'Y' 'N' 'Y' 'Y' 'N' 'Y' 'Y' 'Y' 'Y' 'Y' 'N' 'Y' 'Y' 'Y'
  'Y' 'Y' 'N' 'Y' 'Y' 'Y' 'Y' 'Y' 'N' 'Y' 'N' 'Y' 'Y' 'N' 'Y' 'Y' 'N' 'N'
  'N' 'Y' 'N' 'Y' 'Y' 'Y' 'N' 'Y' 'Y' 'N' 'Y' 'Y' 'Y' 'Y' 'Y' 'Y' 'Y' 'Y'
  'Y' 'Y' 'N' 'Y' 'Y' 'N' 'Y' 'N' 'Y' 'Y' 'Y' 'Y' 'Y' 'Y' 'Y' 'Y' 'Y' 'Y'
  'Y' 'Y' 'N' 'Y' 'Y' 'Y' 'Y' 'Y' 'Y' 'Y' 'Y' 'Y' 'Y' 'Y' 'Y' 'Y' 'Y' 'Y'
  'Y' 'Y' 'N' 'Y' 'Y' 'N' 'Y' 'Y' 'Y' 'Y' 'Y' 'Y' 'Y' 'N' 'Y' 'Y' 'Y' 'Y'
  'Y' 'Y' 'Y' 'Y' 'N' 'Y' 'Y' 'N' 'Y' 'Y' 'Y' 'Y' 'Y' 'N' 'Y' 'Y' 'Y' 'Y'
  'Y' 'N' 'Y' 'Y' 'Y' 'Y' 'Y' 'Y' 'Y' 'N' 'Y' 'Y' 'Y' 'Y' 'Y' 'Y' 'Y' 'Y'
  'N' 'Y' 'Y' 'Y' 'Y' 'Y' 'Y' 'Y' 'Y' 'Y' 'Y' 'Y' 'Y' 'Y' 'Y' 'N' 'Y' 'Y'
  'Y' 'Y' 'Y' 'Y' 'N' 'Y' 'Y' 'Y' 'Y' 'Y' 'Y' 'Y' 'Y' 'Y' 'Y' 'Y' 'Y' 'Y'
  'Y' 'Y' 'Y' 'Y' 'Y']

Note that we can use the usual sklearn functions to test accuracy, precision, and recall.

>>> from sklearn.metrics import precision_score, recall_score, accuracy_score
>>> result = # query from above
>>> accuracy_score(df_test_loan_status, result)
0.827027027027027
>>> precision_score(df_test_loan_status, result, pos_label="Y")
0.802547770700637
>>> recall_score(df_test_loan_status, result, pos_label="Y")
0.992125984251968

Regression

Prediction on a continuous variable works in the same way as a query on a categorical variable.