Python and Salesforce

Today’s post is about, you guessed it, using Python and Salesforce together. Salesforce offers a few different REST API endpoints that allow us to interact with our org. They also offer a SOAP API, but we’re not going to use it today. In fact, we don’t need to worry about the endpoints at all. Thanks to a Python library called Simple-Salesforce.

 

We could do this all by hand with the built-in requests library. You would have to handle sessions, OAuth process and save tokens, deal with request headers, encoding and decoding JSON, creating configs to handle all the different endpoints, etc…

 

Simple-Salesforce Python Library

 

 

This is where the wonderful world of open source software comes to the rescue. A man named Nick Catalano created a library called simple-salesforce. From what I understand, he isn’t actively developing it anymore, but due to open source, the community has picked up adding features. It has about 50 contributors as of July 8, 2020. The lib is being actively developed, with new features added like bulk API support and formatted SOQL queries!

 

 

With a bit of the background out of the way, let’s start digging into the library and see what we can do with it. First, no better way to explain what simple-salesforce is than to quote the README itself:

 

Simple Salesforce is a basic Salesforce.com REST API client built for Python 3.3, 3.4, 3.5, and 3.6. The goal is to provide a very low-level interface to the REST Resource and APEX API, returning a dictionary of the API JSON response.

Simple Salesforce

 

In plain text this quote means Python 3.x is supported, REST requests are handled for us, and the data we want comes back as a native Python data type. Great! Just what we want.

 

Login Information

 

First up, we’ll need a few things from Salesforce. Three things to be exact. Our username (email), password, and security token. The username and password should be pretty easy to find in your profile settings in your Salesforce org, but where do we find the security token? We don’t really see it, rather we’ll need to reset it.

 

 

 

 

 

 

 

 

 

After clicking “Reset Security Token” , you should be sent an email containing your new token. Save this, we’ll need it in the next steps. That’s all we need from Salesforce to get up and running. Now that we have that, let’s start building our Python script and start playing with our org.

 

pip and pipenv

 

 

But first, a quick word about PIP. While working on this example, there was an update to simple-salesforce lib. Github has the current version and we need the format_soql method from it. But pip (PyPi) hasn’t been updated with the new version as of yet, July 2 2020. So, we’ll need to install it via it’s repo on Github.

 

pipenv install -e git+https://github.com/simple-salesforce/
simple-salesforce.git#egg=simple-salesforce

If you are using the demo repo I built, we won’t need to worry about the requirements or dependencies if using pipenv. The demo has a Pipfile that’s pulling from the repo already thanks to the magic of pipenv.

The code

Now, let’s write some code. First up, we’ll bring in the libs we’re going to use. The only one that is not part of the standard lib is simple-salesforce:

from simple_salesforce import Salesforce, format_soql
from pprint import pprint
import json

Simple enough, import simple-salesforce, pprint (pretty print), and json. pprint is only used to make the terminal output look better, so it’s not needed for the core examples. json is used to get the credentials from a .json file, which is what we’ll do next.

# open file holding our login information
# we have the login info in a separate file so we can
# add it to .gitignore to help prevent leaking the information
# environment variables could also work
with open("login.json", "r") as login_file:
    creds = json.load(login_file)

Keep credentials secret

If you were on a project working in a private repo and multiple people needed access to the login credentials, keeping login info in the script itself would be okay, not ideal but okay. The reason we’re doing it is because it’s good practice and it’s open to the public on Github. So hiding and not committing sensitive information is important. This goes for any language or code base.

With that disclaimer out of the way, I included an example login.json file called very creatively “EXAMPLE_login.json“. Let’s take a quick look at it.

{
    "login": {
        "username": "sfdemo@kbcarte.com",
        "password": "My-REALLY-amazing-password",
        "token": "kdjfghdgfFGJbDFgd36DFGHDfgh"
    }
}

Very simple json object only containing the three things we got from the last steps. You can just copy/paste the “EXAMPLE_login.json” and rename it to just “login.json“, then update it with your login information. You can do this in the file explorer or VSCode, but here’s a quick example to do it from the command line.

cp EXAMPLE_login.json login.json
vim login.json

Salesforce Object

With our new login information, we can create the salesforce object back in our python script.

sf = Salesforce(username=creds['login']['username'],
                password=creds['login']['password'],
                security_token=creds['login']['token'])

And that’s it! We now have an object that represents our org, and now we can start doing cool things like SOQL or DML. Next, since we have everything we need to start awesomeness, let’s try a simple SOQL query.

SOQL Query

# an example of running a simple SOQL query
SOQL = "SELECT Id, Email FROM Contact"
data = sf.query(SOQL)
for d in data['records']:
    pprint(f"{d['Id']} -- {d['Email']}")

We defined the query string to get the Id and Email from all the Contacts, called the query() method of our salesforce object, then looped through the returned records and display the results. A note to those new to Python, in the pprint() we use something called an f-string or format string. It makes it easier to embed variables in strings, much like the way Aura Components handle expressions with {!v.my_var}

SOQL is cool and all, but what about adding new data to our org? We can do that too, very easily. We’ll even try using the bulk api to insert 1,000 record. But first we need to create 1,000 records. This is going to be mock or fake data just for the sake of simplicity. We’ll also be testing on the Account object, so only thing required for new records is the Name field.

data = []
for i in range(0,1000):
    data.append({
        "Name": f"Bulk Test {i}"
    })

Now we have a list of dictionaries that represent our record data. Here we have a for loop filling a list with new items. We could also use list comprehension to replace these 5 lines of code with just one.

data = [{'Name': f"Bulk Test {i}"} for i in range(0, 1000)]

Bulk Insert

To bring these new records into Salesforce, we use the .insert() method for the object we want, coming from the salesforce object we created. Confused yet? Here’s the insert code, it should help make things more clear.

# insert the new account using the bulk api
x = sf.bulk.Account.insert(data, batch_size=10000, use_serial=True)
pprint(x)

Here, we’re telling our org we want to use the bulk api “sf.bulk“, then which record object we’re working with “.Account.“, and finally what we want to actually do “.insert()“. We could use any object too, doesn’t have to be just Account. Even custom objects work, so instead of where Account is, we can replace it with something like sf.bulk.MyCustomObj__c.insert(..... We can also specify the batch size, or to process in serial.

Bulk SOQL

If you visit your org and take a look at all Accounts, you should see 1,000 new accounts with names like “Bulk Test 42”. We can also try doing another SOQL query, this time we’ll use the bulk api for the query. We’ll also show how to use things such as “LIKE” in SOQL statements.

# now lets get those records so we can delete them
SOQL = format_soql("SELECT Id, Name FROM Account WHERE Name LIKE '{:like}%'", "Bulk Test")
the_accounts = sf.bulk.Account.query(SOQL)
pprint(the_accounts)

Simple-salesforce now comes with the handy little method called format_soql that allows us to do things such as LIKE. format_soql is also the reason we used the Github repo for pip instead of what’s on PyPi, the new method was just introduced in the last 2 weeks.

Now that we know we can insert and add new data and records to salesforce, let’s see about removing it. Looking at the README, seems like the delete() method needs a list of records containing the Id of the record to be deleted. The records need to be a key-value pairs or dictionary just like how the query was returned. We already have all the Id’s for our inserted record from the “bulk.query()

{'Id': '0013h00000EdP87AAF',
  'Name': 'Bulk Test 998',
  'attributes': {'type': 'Account',
                 'url': '/services/data/v42.0/sobjects/Account/0013h00000EdP87AAF'}}]

Looks like simple-salesforce also returns something called “attributes“. This just tells use which object we’re working with and which api endpoint we got the information from. For our example, and for the .delete() method, we only need the Ids. So let’s clean up and make a new list with only what we want.

account_ids = []
for a in the_accounts:
    account_ids.append({"Id": a["Id"]})
pprint(account_ids)

Here is another opportunity to practice list comprehension! See if you can get that for loop down to one line.

Bulk Delete

So now we have the list of Id’s, now we just simply call the delete() method and go refresh our org’s Account list.

data = sf.bulk.Account.delete(account_ids, batch_size=10000, use_serial=True)
pprint(data)

All the records we inserted are now gone!

Final Thoughts

In conclusion, let’s recap what we’ve learned. For one, Python is awesome. Second, integrating Salesforce in a Python project is very very simple. With only three pieces of information, we can create a native Python object representing our Salesforce org, do things with the object like SOQL and DML, and finally have access to multiple salesforce API such as bulk or search.

This is only scratching the surface. We can do much more and simple-salesforce also has methods for metadata and describe. From this basic example, we could bring in Salesforce data to Flask API’s we build or insert new data to Salesforce from a data scraping crawler we make in Python.

We can also harness the power of Python’s ecosystem of visualization and reporting such as Pandas or SciPy. Nothing stopping us from grabbing Salesforce data and running it through machine learning or neural networks using PyTorch or TensorFlow.

If you’ve made it this far, thanks for reading! Here I’ll link the demo repo containing all the code discussed in this post and a YouTube video of my presentation. Also, if you are interested in learning more about Python, you can check this article: How to do Time Series Analysis with Python & Pandas.

https://www.youtube.com/watch?v=rf1jx3jbL2M&feature=youtu.be

How to do Time Series Analysis with Python & Pandas

First steps on analyzing and stationarizing of time series data

A while ago I had the opportunity to work on building a sales forecaster as a POC. It was a challenging project with a cool MVP as an outcome. Through this post, I will share part of my journey and findings on analyzing the data I was provided.

Assumptions

I will assume you have previous knowledge of both Python and Pandas.

First things first…

This project started like every other data science project: familiarizing ourselves with the data we had in hand. I did this by importing the CSV file provided as a data source.

Once I had a clear idea of what kind of data we were dealing with, I proceeded with the initial exploration and usual transformations.

To simplify future manipulations over Pandas DataFrame, I made ‘fecha’ the index of the DataFrame. Since the records already came in the correct order, it was simple to perform this transformation and convert the DataFrame into a series with a ‘daily-level’ frequency, by resampling the entire DataFrame.

After completing the above transformations, the data was ready to be plotted. With the help of the Matplotlib library, I was able to display a graph of the quantity of product sold per day throughout the years.

So…what’s a time series and what makes it special?

From the initial data exploration, it was clear we were dealing with what is known as a time series. Time series is just a fancy way of saying we are dealing with data points indexed in time order.
Usually, when dealing with time series, we look for some special characteristics in our data to be able to make predictions based on it. Specifically, we look for a time series that is stationary.

Stationarity of a time series

We can say that a time series is stationary when its mean and variance are not a function of time (i.e. they are constant through time).
Stationarity is important because most of the statistical methods to perform analysis and forecasting work on the assumption that the statistical properties (mean, variance, correlation, etc.) of the series are constant in time.

How to test the stationarity of a time series?

Stationarity can be assessed in two ways:
● Visually inspect the data points and check how the statistical properties vary in time
● Perform a Dickey-Fuller test

Let us take a visual approach first and see how it goes:

By plotting the standard deviation and mean along with the original data points, we can see that both of them are somewhat constant in time. However, they seem to follow a cyclical behavior.

Although the visual approach can give us a clue, applying the Dicky-Fuller Test (DF-test) can provide a more precise way to measure the stationarity of our series.

Results of DF-test

I will not go through much detail on how the DF-test work, but let’s say all we need to care about is the numbers we see in “Test Statistic” and “Critical Values”. We always want the former to be less than the latter. And the lesser the value of Test Statistic the better.

Our series is stationary given that Test Statistic is less than all the Critical Values, though not by much.

Below you can see the code I used to evaluate the stationarity:

What if our time series was non-stationary?

There are some techniques one can apply to stationaries a time series. The two I am more familiar with are:
Transformation: apply transformation which penalizes higher values more than smaller values. These can be taking a log, square root, cube root, etc. This method helps in reducing the trend.
Differencing: take the difference of the observation at a particular instant with that at the previous point in time. This deals with both trend and seasonality, hence improving stationarity.

Pandas and NumPy provide you with very practical ways to apply these techniques.

For the sake of demonstration, I will apply a log transformation to the DataFrame.

Bonus track: We can even apply a smoothing technique over the transformed data set to remove the noise that may be present. A common smoothing technique is to subtract the Moving Average from the data set. This can be achieved as easy as:

Clearly, we can see that applying log transformation + moving average smoothing to our original series resulted in a better series; in terms of stationarity.

To apply differencing, Pandas shift() function can be used. In this case, first-order differencing was applied using the following code.

Let us perform a DF-test on this new resulting series.

With the log transformation and differencing the test statistic is significantly smaller than the critical values, therefore this series is too more stationary than the original series.

Wrapping up…

When we face a predictive task that involves a time series, we need to analyze said series and determine whether it is stationary or not. To determine the stationarity, we can either plot the data and visually inspect the mean and other statistical properties or perform a Dickey-Fuller Test and look at the Test Statistic and Critical Values. In case the series happens to be non-stationary, we can apply techniques such as transformation or differencing to stationarize the series.

After all this analysis and preparation, the next step in the project was to forecast with the time series, but that’s a topic for another post!

Learn more from our team here, or check out our custom development services