Usage
Installation
To use Featrix, first install the client using pip:
$ pip install featrix-client # Coming soon.
You’ll also need a Featrix server; you can run the enterprise edition on-site in your environment or use our hosted SaaS.
What’s Included
The featrix-client
package includes a few key modules:
|
A FeatrixTransformerClient for accessing a Featrix embedding service. |
|
|
A set of functions for plotting embedding similarity. |
|
|
A set of functions for working with data that we have found to be useful. |
Working with Data
1import featrixclient as ft
2import pandas as pd
3df = pd.read_csv(path_to_your_file)
Train a vector space and a model
You can train multiple models on a single vector space.
Check out our live Google Colab demo notebooks <https://featrix.ai/demo> for examples. The general approach is as follows:
1# Split the data
2df_train, df_test = train_test_split(df, test_size=0.25)
3
4# Connect to the Featrix server. This can be deployed on prem with Docker
5# or Featrix’s public cloud.
6featrix = ft.Featrix("http://embedding.featrix.com:8080")
7
8# Here we create a new vector space and train it on the data.
9vector_space_id = featrix.EZ_NewVectorSpace(df_train)
10
11# We can create multiple models within a single vector space.
12# This lets us re-use representations for different predictions
13# without retraining the vector space.
14# Note, too, that you could train the model on a different training
15# set than the vector space, if you want to zero in on something
16# for a specific model.
17model_id = featrix.EZ_NewModel(vector_space_id,
18 "Target_column",
19 df_train)
20
21# Run predictions
22result = featrix.EZ_PredictionOnDataFrame(vector_space_id,
23 Model_id,
24 "Target_column",
25 df_test)
26
27# Now result is a list of classifications in the same symbols
28# as the target column
Predicting on a probability distribution
We can specify a few characteristics of an object and ask for the target field probability distribution. For example, in our mortgage loan demo, we might ask “what are the chances someone who is married will be approved for a loan?”
>>> # result_married_only
>>> featrix.EZ_Prediction(vector_space_id, model_id, {"Married": "Yes"})
{'<UNKNOWN>': 0.0011746988166123629, 'N': 0.33159884810447693, 'Y': 0.6672264933586121}
We can pass in multiple criteria:
>>> # result_married_and_not_graduate
>>> featrix.EZ_Prediction(vector_space_id, model_id, {"Education": "Not Graduate", "Married": "Yes"})
{'<UNKNOWN>': 0.003182089189067483, 'N': 0.5865148305892944, 'Y': 0.41030314564704895}
Classifying records
We can determine a category an object belongs to. Typically we’ll pass in a list of objects and get back a vector of which class each object targets. Featrix includes an EZ_PredictionOnDataFrame call to facilitate passing objects in bulk.
The interface is similar to sklearn’s clf.predict() functions. The target column is specified to ensure it is removed from the query dataframe before passing to the model, if it is present.
>>> featrix.EZ_PredictionOnDataFrame(vector_space_id,
model_id,
"Loan_Status", # target column name
query_df)
['Y' 'Y' 'Y' 'Y' 'Y' 'N' 'Y' 'Y' 'N' 'Y' 'Y' 'Y' 'Y' 'Y' 'N' 'Y' 'Y' 'Y'
'Y' 'Y' 'N' 'Y' 'Y' 'Y' 'Y' 'Y' 'N' 'Y' 'N' 'Y' 'Y' 'N' 'Y' 'Y' 'N' 'N'
'N' 'Y' 'N' 'Y' 'Y' 'Y' 'N' 'Y' 'Y' 'N' 'Y' 'Y' 'Y' 'Y' 'Y' 'Y' 'Y' 'Y'
'Y' 'Y' 'N' 'Y' 'Y' 'N' 'Y' 'N' 'Y' 'Y' 'Y' 'Y' 'Y' 'Y' 'Y' 'Y' 'Y' 'Y'
'Y' 'Y' 'N' 'Y' 'Y' 'Y' 'Y' 'Y' 'Y' 'Y' 'Y' 'Y' 'Y' 'Y' 'Y' 'Y' 'Y' 'Y'
'Y' 'Y' 'N' 'Y' 'Y' 'N' 'Y' 'Y' 'Y' 'Y' 'Y' 'Y' 'Y' 'N' 'Y' 'Y' 'Y' 'Y'
'Y' 'Y' 'Y' 'Y' 'N' 'Y' 'Y' 'N' 'Y' 'Y' 'Y' 'Y' 'Y' 'N' 'Y' 'Y' 'Y' 'Y'
'Y' 'N' 'Y' 'Y' 'Y' 'Y' 'Y' 'Y' 'Y' 'N' 'Y' 'Y' 'Y' 'Y' 'Y' 'Y' 'Y' 'Y'
'N' 'Y' 'Y' 'Y' 'Y' 'Y' 'Y' 'Y' 'Y' 'Y' 'Y' 'Y' 'Y' 'Y' 'Y' 'N' 'Y' 'Y'
'Y' 'Y' 'Y' 'Y' 'N' 'Y' 'Y' 'Y' 'Y' 'Y' 'Y' 'Y' 'Y' 'Y' 'Y' 'Y' 'Y' 'Y'
'Y' 'Y' 'Y' 'Y' 'Y']
Note that we can use the usual sklearn functions to test accuracy, precision, and recall.
>>> from sklearn.metrics import precision_score, recall_score, accuracy_score
>>> result = # query from above
>>> accuracy_score(df_test_loan_status, result)
0.827027027027027
>>> precision_score(df_test_loan_status, result, pos_label="Y")
0.802547770700637
>>> recall_score(df_test_loan_status, result, pos_label="Y")
0.992125984251968
Regression
Prediction on a continuous variable works in the same way as a query on a categorical variable.