|
|
June 9, 2010
Oh boy, I love debugging!
And today I really had to debug DataLoader!
I do a lot of stuff with DataLoader — see my previous blogs on the topic. But I’ve always uploaded to Salesforce from a PostgreSQL database — never extracted.
DataLoader allows a job to be configured where a SOQL query is run, the data is extracted from Salesforce and the results are inserted into a database. The documentation is woeful on this whole topic, but fortunately DataLoader comes with sample process-conf.xml and database-conf.xml files that contain some examples.
The basic method is:
process-conf.xml defines the job and includes the SOQL in a extractionSOQL parameter
- An
SDL file defines the field mapping between Salesforce and the output
database-conf.xml defines the SQL statement to use, plus the parameter types
First hint: The Field names in the SDL file are case-sensitive. Use “ID” instead of “Id” and the field won’t come across.
Second hint: All fields come across in quotes. This totally destroys any attempt to load numbers!
Here’s part of the database-conf.xml that I used:
<bean id="extractMegaphoneQuery"
class="com.salesforce.dataloader.dao.database.SqlConfig" singleton="true">
<property name="sqlString">
<value>
INSERT INTO renewals_megaphone (
period, megaphone_date, opportunity_name, reason_lost, owner, expiry_date, product, years_owned)
VALUES (@period@, @megaphone_date@, @opportunity_name@, @reason_lost@, @owner@,
@expiry_date@, @product@, @years_owned@)
</value>
</property>
<property name="sqlParams">
<map>
<entry key="period" value="bigint"/>
<entry key="megaphone_date" value="java.sql.Date"/>
<entry key="opportunity_name" value="java.lang.String"/>
<entry key="reason_lost" value="java.lang.String"/>
<entry key="owner" value="java.lang.String"/>
<entry key="expiry_date" value="java.sql.Date"/>
<entry key="product" value="java.lang.String"/>
<entry key="years_owned" value="integer"/>
</map>
</property>
</bean>
You’ll notice I have two fields that are numbers — period (bigint) and years_owned (integer). When trying to load these fields into my database, I kept getting the message:
Sql error: ERROR: column "period" is of type bigint but expression is of type character varying.
org.postgresql.util.PSQLException: ERROR: column "period" is of type bigint but expression is of type character varying
There was no way that I could convince DataLoader that the field was numeric — I tried lots of java.lang.Integer type entries in the sqlParams, but didn’t help.
I then changed the output to csvWrite instead of databaseWrite so that I could see the output, and it came through like this:
"PERIOD","MEGAPHONE_DATE","OPPORTUNITY_NAME","REASON_LOST","OWNER","EXPIRY_DATE","PRODUCT","YEARS_OWNED"
"83823.0","2010-06-08","Joe Smith","Invalid","John","2010-03-14","a0624004000qRtkgA2","3.0"
You’ll notice that the first and last columns are meant to be numbers, but they are coming through in quotation marks. That’s what was causing the database load to fail!
So, I updated my SQL to convert the strings to numbers:
<property name="sqlString">
<value>
INSERT INTO renewals_megaphone (
period, megaphone_date, opportunity_name, reason_lost, owner, expiry_date, product, years_owned)
VALUES (@period@::numeric, @megaphone_date@, @opportunity_name@, @reason_lost@, @owner@, @expiry_date@, @product@, @years_owned@::numeric)
</value>
</property>
This successfully cast the strings into numbers and they finally got stored in the database (which automatically ignored the decimal places when converting to bigint and integer).
I hope the blog post saves other people similar problems!
The Bottom Line
- DataLoader can extract data and insert it into a database
- Numbers are quoted on export and cause database errors
- I type-cast the
string into numeric to enable the database to load it
June 8, 2010
I was configuring DataLoader to export a list of Opportunities, and I went to select the “Owner Name”. However, only OwnerID is available on an Opportunity.
“No problem!” I think to myself, as I go and create a Custom Field with a formula equal to Owner.Alias.
“What?” I say in surprise. “It won’t let me access a field on the Owner object!”

Mmm. This is strange. Then a Google Search reveals an 18-month old Ideas request to Make “Owner Id” Look up fields available for formulas.
Oh dear.
Well, that’s a shame, but it’s easily solved! I created:
- A field called
Owner_Link__c of type Lookup(User)
- A Trigger to copy
OwnerId to Owner_Link__c when the Owner is changed
- A test for the Trigger
Trigger:
trigger Update_OwnerLink_on_Owner_Update on Opportunity (before update, before insert) {
// When 'Owner' field is changed, update 'OwnerLink' too
// Loop through the incoming records
for (Opportunity o : Trigger.new) {
// Has Owner chagned?
if (o.OwnerID != o.Owner_Link__c) {
o.Owner_Link__c = o.OwnerId;
}
}
}
Test:
public with sharing class TriggerTest_OwnerLink {
static TestMethod void testOwnerLink() {
// Grab two Users
User[] users = [select Id from User limit 2];
User u1 = users[0];
User u2 = users[1];
// Create an Opportunity
System.debug('Creating Opportunity');
Opportunity o1 = new Opportunity(CloseDate = Date.newInstance(2008, 01, 01), Name = 'Test Opportunity', StageName = 'New', OwnerId = u1.Id);
insert o1;
// Test: Owner_Link should be set to user 1
Opportunity o2 = [select id, OwnerId, Owner_Link__c from Opportunity where Id = :o1.Id];
System.assertEquals(u1.Id, o2.OwnerId);
System.assertEquals(u1.Id, o2.Owner_Link__c);
// Modify Owner
o2.OwnerId = u2.Id;
update o2;
// Test: Owner_Link should be set to user 2
Opportunity o3 = [select id, OwnerId, Owner_Link__c from Opportunity where Id = :o2.Id];
System.assertEquals(u2.Id, o3.OwnerId);
System.assertEquals(u2.Id, o3.Owner_Link__c);
}
}
This then gave me a new Owner object on my Opportunity on which I could create Formulas:

I could also use it in the DataLoader by referring to Owner_Link__r.Alias.
Hooray!
Easy to solve, but it’s a shame it was necessary.
The Bottom Line
- Formulas can’t access
Opportunity.Owner fields
- Create a ’shadow’ field to hold Owner and populate it via a Trigger
August 18, 2008
Today I had to create campaigns to track Trade Show attendees. I’ve written previously about creating campaigns from Webex attendee files, but my Trade Show source files were not as friendly. Each Trade Show organiser gave us lists in different formats — the best one contained complete Name, Address, Company, Title and Email information, while the worst had just First Name and Email (yes, really!).
My requirements were:
- Load information from disparate source documents
- Create Contacts, not Leads (we don’t use Leads)
- Don’t overwrite existing Contact records (existing customers are populated from an External system, with more accurate information than Trade Show visitor data)
- Add them to a Campaign for tracking purposes
Importing Contacts
The first step was quite easy with the help of the Salesforce.com Data Loader (known as LexiLoader on the Mac). It can import Contact records from a CSV file. The trick here, however, was to use INSERT rather than UPSERT. My system is configured to use Email as an External ID for Contacts, so the import will fail for records where there is an existing Contact with the same Email address. This is good — it lets me create new records only where the Contact already does not exist.
Also as part of the import, I added a column for RecordTypeId. This is because I have two Record Types for my contacts:
- Manually created contacts (which let users edit most fields) — this is the one I selected, and
- Automatically created contacts (which come from my external system, and don’t let users edit fields since they are slaved from my external system)
I also added a column for AccountId, which is quite important. Any Contact records that aren’t linked to an Account are only visible to the record creator. In our case, we don’t actually use Accounts (I’ll blog about that in future), so I just linked all the new Contacts to a ‘Default Account’ to make them visible.
Adding Contacts to Campaigns
There’s lots of ways to add Campaign Members to Campaigns, either from Lead/Contact records or from the Campaign record. In this case, I wanted to add them in bulk, so I used the Manage Members button:

To load in bulk, I wanted to use an “Import File” capability, but I didn’t want to load Leads. So, the Add Members – Import File option was no good, since it only loads records as Leads. Instead, I used the Update Status – Import File option which can not only update a status, but also associate a Contact with a Campaign. In other words, it can take existing Contacts and add them to a Campaign — Perfect!
Unfortunately, Import File wants a file that contains the Salesforce Contact ID. This is a problem, since the ID is generated within the Force.com platform and is not in my CSV file. The online hints for that command suggest creating a Report on campaign members, saving as CSV then importing them to change the status. This is not an option for me since some of the people are already in Salesforce.com as existing Contacts and others I’ve just created. So, there’s no identifier that lets me select them all in a Report.
So, here’s a trick… After using DataLoader, two files are created. A success file contains a copy of all records that were successfully loaded, with a new ID column inserted. I can use this file with the Import command to update all the statuses. This takes care of all the Contacts that were created during the original Insert of Contacts. Fantastic!
But what about the Contacts that already existed, and therefore caused an error during the Insert? Well, the ID is actually supplied in the error message (eg duplicate value found: EmailAddr__c duplicates value on record with id: 0032000000Fs7Xp) and can be extracted using a RIGHT(B2,15) type of spreadsheet command. Just put that value in a new column, import the file and you’re done!
(To slightly simplify things, it’s also possible to load just one file — the original CSV file! Since all records now exist, every row will generate an error. You can then just process the error file rather than separately loading the success and error files.)

I’ll admit that during this process I stuffed up — I imported attendees to the wrong campaign. To remove them, I could have deleted the Campaign and started again. Instead, I used a favourite utility called Mass Delete Anything (there’s a Windows and a Mac version). I just created an appropriate SOQL command to identify the offending records, then Mass Deleted them. Very handy, but also very dangerous. Use with care on your Production systems.
The end result of all this… My Campaigns can now report counts of Total Contacts for the Trade Shows. When combined with a Trigger I’ve got that identifies people when they purchase products, I can track the conversion rate for Trade Show attendees, all without having to manually type any Contact information.
The Bottom Line
- You can’t import Contacts directly into Campaigns, but you can import a list of existing Contacts to link to a Campaign
- DataLoader saves the day again, especially with the information saved in its
success and error output files
August 14, 2008
I love upserts. They are a combination of Inserts and Updates. The key to using Upsert is to nominate a field that contains an unique identifier for the record. While importing, if a record already exists with the same identifier, the record is Updated. If no such record exists, a new record is Inserted.
A good identifier is, of course, the Salesforce ID that is assigned to each record. However, there’s two significant problems in using the Salesforce ID:
- It varies across instances (eg sandbox, production, developer account), which means that you can’t import the same IDs in different systems
- It is generated by the Force.com platform — you can’t create it yourself
The Good News
Enter the External ID. This is a flag that can be set on Custom Fields that identifies a field as being an Unique Identifer, and allows that field to be used in Upserts.

My favourite use for an External ID is in storing the unique identifier from an External System (eg a customer database). If I load my customers into Salesforce.com, I can then use Upsert to reload them next time without creating duplicate records. The external system doesn’t even need to know that Salesforce exists, and certainly doesn’t have to store the Salesforce IDs.
The Bad News
I load a lot of customer information, and the best identifier for customers is their Email Address. Unfortunately, the Email field on Contact records is a Standard field and only Custom fields can be defined as an External ID.
To get around this limitation, I’ve created my own Email field (EmailAddr__c) that I define as an External ID. However, this leads to further problems:
- Salesforce.com uses the normal “Email” field for sending emails (obvious, eh?)
- Lots of existing lists, views and reports use the standard Email field
- Things get very confusing when you have more than one ‘Email’ field
So, I created a Workflow that keeps the two fields in Sync. I import into the EmailAddr__c field and then get the workflow to update the stadnard Email field.


I also change the label of the standard Email field to “(Unused Email Field)” so that I don’t accidentally select it in Page Layouts.
What’s that you say? Why don’t I just use the normal Email field since the Workflow sets it to the same value? Well, the answer lies in the field type. The standard Email field is of type “Email”, which means that Salesforce.com performs validation to ensure the address ‘looks like’ an email field. However, my external system does not have similar checking. Therefore, I want to use the email address from my external system as an unique identifier even if it is a badly formatted address.
I’ve found that by importing directly into the Email field, the Data Loader will reject a record if it is a badly-formed email address, but copying it from my EmailAddr__c field into the Email field seems to avoid this checking. At times I also had the situation where EmailAddr__c had a value, but Email was blank. So, I find it a lot safer to use my custom email field, since I know it will always have a value. (We’ve since improved our external system to do similar validation on the email field, but we’ve still got lots of historical records with the problem.)
The Bottom Line
- We can now use an Email Address as an External ID
- We can now Upsert on Email Addresses
- It’s a little messy, but it works
- I wish Salesforce.com would let us define Standard fields as External IDs
May 14, 2008
I arrived at work today to get a mysterious message from one of my users. Opportunities, for some strange reason, were suddenly appearing in the system and being marked as already Won!
While this is great from a sales perspective (we all like Won opportunities!), it should not have been happening from a data perspective. Perhaps a bit of explanation is in order…
My employer (a great Australia software company) sells software. Actually, since people can download and trial our software for free, it would be more accurate to say that we sell software licenses. Each year, we encourage our customers to renew their software maintenance so that they can get support and upgrades. So, our primary use for Salesforce.com is to track Renewal Opportunities.
These opportunities are perfect for automation. Our existing licensing system keeps track of customers, so I use the Salesforce.com Data Loader to import a list of products that have expired and we have dedicated Renewal Specialists who contact the customers to see if they would like to renew. We typically get around 70% renewing, so they must like our software! (Unlike other Enterprise Software companies, they can keep using our software even if they don’t pay annual maintenance.)
The basic system is:
- An SQL view on our customer database looks for expired licenses
- The Data Loader uses this view to extract information from the database and load it directly into Salesforce as an Opportunity object
- I use an Upsert operation with my licensing system supplying an External ID. This means that I can load new and existing opportunities in one batch, without my external system even having to know about Salesforce. As long as I supply the same External ID, Salesforce will automatically update existing records or create new ones if they don’t exist.
- The data is reloaded overnight to show any information that has updated or any Opportunities that have closed. In fact, the only way to mark an Opportunity as Won is via the DataLoader — staff can’t claim a Closed/Won situation unless the external system says it has actually been sold.
So, why did we start getting Opportunities appearing long after expiry, when they had already been won?
Well, I tracked it down to the fact that I’m only creating Opportunities for customers in the Americas region, since that’s where our Renewal Specialist staff are located. It appears that some of our customers had incorrect address information so the system couldn’t understand their Country. My little SQL function that converts Country into Region didn’t realise that the Opportunities were in the Americas Region since they were missing the right country information.
Fortunately, we’ve had some staff manually cleaning our customer data. It just happened that they fixed the address information for a couple of customers who had expired licenses. This then caused the Opportunity to magically appear in Salesforce — later than should have happened. I managed to track this down from some ‘Modified date’ fields we’ve got on our data that I use for incremental loading (more on that topic another day). Plus, the automatic ‘Created by’ field on records gave great information on when records appeared.
The Bottom Line
- Clean data is good
- Automation is great, but can miss some things that humans would notice
- Audit tracking of fields and records can help ‘piece together’ clues in mysterious situations
|