AddThis Feed Button

Recent Items

Tags: Data Loader, SOQL

Our Salesforce instance loads quite a bit of data via the Data Loader (so much, that it sometimes screws things up). This is done in automated batches via the command line. Better yet, the information is extracted directly from our JDBC database — no disk files are involved! To top it off, it even runs on a Linux system (which was a little difficult because Data Loader is only distributed as a Windows executable!).

We also export some data from Salesforce, again inserting it directly into a database.

My challenge for today was to extract some information from the Opportunity Stage History (kept in the OpportunityHistory object). Specifically, I wanted to extract some Opportunity data together with the date that the Opportunity’s Stage was set to ‘Lost’. This required ‘joining’ between the Opportunity and OpportunityHistory objects.

I referred to the trusty SOQL Relationship Queries documentation and wound up writing a query on the OpportunityHistory object that also included data from the ‘parent’ Opportunity object:

select
  Opportunity.Reason_Lost__c, -- My custom fields
  Opportunity.Expiry_date__c,
  Opportunity.Product_Family__c,
  CreatedDate    -- Date on the Stage History record
from OpportunityHistory
where
  StageName = 'Lost'  -- On OpportunityHistory
  and Opportunity.Reason_lost__c != ''
  and Opportunity.CreatedDate >= 2009-11-01T00:00:00Z  -- I always hate having to write dates like this!

This very nicely returned me rows from Opportunity objects together with the date that the Stage was changed to ‘Lost’.

However, I had a lot of trouble getting it to load, and then I realised that my SDL file (used to map fields) also had to have the ‘Opportunity.‘ prefix!

In export_map_reason_lost.sdl
:

Opportunity.Reason_Lost__c=reason_lost
Opportunity.Product_Family__c=product
Opportunity.Expiry_Date__c=expiry_date
CreatedDate=lost_date

At this point, I should give a shout-out to my favourite Salesforce employee, Simon Fell (see my interview with him at Dreamforce 2009). He has written a great Mac utility called Soql Xplorer that makes writing SOQL a snap!

To assist any readers who are trying to nut-out how to use Data Loader to push/pull data to/from a database, here’s the relevant entries in my XML files:

In process-conf.xml:

    <bean id="extractReasonLost"
          class="com.salesforce.dataloader.process.ProcessRunner"
          singleton="false">
        <description>Extract Lost Opportunities to discover Reason Lost</description>
        <property name="name" value="extractReasonLostName"/>
        <property name="configOverrideMap">
            <map>
                <entry key="sfdc.endpoint" value="https://emea.salesforce.com"/>
                <entry key="sfdc.username" value="john.rotenstein@atlassian.com"/>
                <entry key="sfdc.password" value="secret"/>
                <entry key="sfdc.timeoutSecs" value="600"/>
                <entry key="sfdc.loadBatchSize" value="100"/>
                <entry key="sfdc.entity" value="Opportunity"/>
                <entry key="process.enableLastRunOutput" value="false" />
                <entry key="sfdc.extractionRequestSize" value="500"/>
                <entry key="sfdc.extractionSOQL" value="select Opportunity.Reason_Lost__c, Opportunity.Expiry_date__c,
                                 Opportunity.Product_Family__c, CreatedDate
                                 from OpportunityHistory where StageName = 'Lost' and Opportunity.Reason_lost__c != ''
                                 and Opportunity.CreatedDate >= 2009-11-01T00:00:00Z"/>
                <entry key="process.operation" value="extract"/>
                <entry key="process.mappingFile" value="export_map_reason_lost.sdl"/>
                <entry key="dataAccess.type" value="databaseWrite"/>
                <entry key="dataAccess.name" value="extractReasonLostBean"/>
            </map>
        </property>
    </bean>

In database-conf.xml:

<bean id="extractReasonLostBean"
      class="com.salesforce.dataloader.dao.database.DatabaseConfig"
      singleton="true">
    <property name="sqlConfig"  ref="extractReasonLostQuery"/>
    <property name="dataSource" ref="server_name"/>
</bean>

<bean id="extractReasonLostQuery"
      class="com.salesforce.dataloader.dao.database.SqlConfig" singleton="true">
    <property name="sqlString">
        <value>
            INSERT INTO renewals_lost (
               period, reason_lost, expiry_date, product, lost_date)
            VALUES (@period@::numeric, @reason_lost@, @expiry_date@, @product@, @lost_date@ )
        </value>
    </property>
    <property name="sqlParams">
        <map>
            <entry key="period"           value="integer"/>
            <entry key="reason_lost"      value="java.lang.String"/>
            <entry key="expiry_date"      value="java.sql.Date"/>
            <entry key="product"          value="java.lang.String"/>
            <entry key="lost_date"        value="java.sql.Date"/>
        </map>
    </property>
</bean>

The Bottom Line

  • Data Loader is very powerful for importing and exporting directly to/from a database
  • When extracting via SOQL that involves a relationship, include the fully qualified name in the SDL file (eg Opportunity.Stage)
  • Simon Fell is my hero!

I received a notice from my friendly Salesforce rep recently, advising that I had gone over my storage limit:

The last time I had heard from Salesforce on such a matter was when Chatter went wild and took me to 493% of my storage allocation! Oh, you’ll also notice from the picture in that article how much my ‘Contact’ record storage had grown over the past year!

This time, my rep kindly offered to raise an invoice for the additional storage space. I’m cheap at heart, so I decided instead to reduce my storage space. Not that I’m upset at Salesforce — I know it’s expensive to store data in their system because it’s all replicated between data centers, backed-up, etc. However, I knew that a lot of my data was unnecessary, and I could just dump it.

To explain, I populate my Salesforce instance from an external system. I had over 220,000 Contact records, of which only a subset were required. So, I decided to remove Contact records:

  • For people who don’t own any of our products (defined in a custom field)
  • For records with no Activities

So, I ran Data Loader (actually, the Mac version which is LexiLoader, compliments of Simon Fell, who reminds people to vote for his Idea that Salesforce produce an official Mac version) and extracted a list of contacts who don’t own a product.

I then ran another Data Loader extract to get a list of all Activity records.

Next, took the first list of contacts and subtracted any contacts associated with the Activity records. (I couldn’t figure out how to do this in one SOQL statement, suggestions welcome!)

Finally, I took the list of record IDs and asked the Data Loader to do a bulk delete of the records. It took my storage way down:

I must say, the bulk delete operation was extremely fast, since the Data Loader uses the Bulk API for such operations.

The ‘Oops!’ moment

Things seemed fine until a couple of days later when my users reported that they had records with Activities that had been deleted. I went back and checked my work, only to discover that I made an error in my “subtraction” step. Instead of taking all contacts and removed all IDs that matched a list of contacts that had Activities, I subtracted the list of Activities themselves. Since these objects have non-overlapping Ids (that is, no Activity IDs matched any Contact IDs), that operation did nothing.

End result: I deleted a lot of useful records. Gulp!

I did some searching and found rumors that Salesforce could undelete records, but charge a lot of money for the privilege. Not great, since it would cost more than I had originally tried to save!

Next, I investigated the Recycle Bin. Here’s what the official documentation says:

The Recycle Bin link in the sidebar lets you view and restore recently deleted records for 30 days before they are permanently deleted. Your recycle bin record limit is 250 times the Megabytes (MBs) in your storage. For example, if your organization has 1 GB of storage then your limit is 250 times 1000 MB or 250,000 records. If your organization reaches its Recycle Bin limit, Salesforce automatically removes the oldest records if they have been in the Recycle Bin for at least two hours.

My limit actually is 1GB (because we only have a small number of users, so we get the minimum size). Therefore, I get 250,000 records. Given that I deleted about 220,000 records, it means they’re all still in there!

I started to use the Recycle Bin ‘undelete’ function, but doing 200 at a time means I’d need to do it 1000 times!

So, I next tried some Apex in the System Log window, like this:

Contact[] c = [select id from contact where isDeleted = true LIMIT 1000 ALL ROWS];
undelete c;

However, some records didn’t want to undelete because our external system had already Upserted replacements and undeleting some records would have caused a clash of unique fields. And if this happened, the whole undelete was rolled-back rather than allowing through the non-clashing records. Argh! So, I then went to something a bit more sophisticated:

// Get a list of Contact records to delete
Contact[] contacts = [select id, EmailAddr__c from contact where isDeleted = true limit 1000 ALL ROWS ];

// Put the Email addresses into an array
String[] emails = new String[]{};
for (Contact c : contacts) {
  emails.add(c.EmailAddr__c);
}

// Get a list of 'alive' Contacts (not deleted) that already use that email address
Contact[] alive = [select id, EmailAddr__c from contact where EmailAddr__c in :emails];
system.debug('Found: ' + alive.size());

// Make a list of Contacts to delete
if (alive.size() != 0) {
  for (Contact c : alive) {
    for (Integer  i = 0; i < contacts.size(); ++i) {
      if (contacts[i].EmailAddr__c == c.EmailAddr__c) {
        contacts.remove(i);
        break;
      }
    }
  }
  system.debug('Will undelete: ' + contacts.size());

  // Delete them!
  undelete contacts;
}

I should explain the EmailAddr__c thing. You see, Email is my external ID. However, I couldn’t use the standard Email field as an External ID because I can’t force it to be unique. So, I have a second field for Email address and I populate the both. For more details, see my earlier blog post.

Anyway, the above code took about 2 minutes for 1000 records:

10:11:19.031 (31752000)|EXECUTION_STARTED
10:11:19.031 (31788000)|CODE_UNIT_STARTED|[EXTERNAL]|execute_anonymous_apex
10:11:19.032 (32365000)|SOQL_EXECUTE_BEGIN|[1]|Aggregations:0|select ...
10:11:19.074 (74698000)|SOQL_EXECUTE_END|[1]|Rows:1000
10:11:19.202 (202887000)|SOQL_EXECUTE_BEGIN|[6]|Aggregations:0|select ...
10:13:07.266 (108266842000)|SOQL_EXECUTE_END|[6]|Rows:157
10:13:07.267 (108267315000)|USER_DEBUG|[7]|DEBUG|Found: 157
10:13:15.949 (116949306000)|USER_DEBUG|[19]|DEBUG|Will delete: 896
10:13:15.950 (116950156000)|DML_BEGIN|[20]|Op:Undelete|Type:Contact|Rows:896
10:13:19.937 (120937987000)|DML_END|[20]

Most of the time taken was for the 2nd SOQL query (106 seconds), which matches on email. The loop to eliminate duplicates also took time (8 seconds). The undelete itself was relatively quick (4 seconds).

So, I included an ORDER BY clause in my initial query that tried older records first. This resulted in less email clashes, and much faster execution times.

Over the course of a day, I managed to undelete all the records. In fact, it sped up a lot after midnight San Francisco time (which is easy for me because I’m in Australia). Finally, I did my mass delete properly and everybody was happy.

The result:

How to avoid this error in future

Okay, I was doing dangerous stuff and I did it wrong. So how could I avoid this in future? Some ideas:

  • Make a backup first! Extract all data first (but that’s not easy!) or use the “Export Data” function (but that’s not easy to reload).
  • Try it in the Sandbox first. However, we have a Cofiguration-only Sandbox, without all the data. No good.
  • Test before committing the delete. I did pick random records, but obviously not enough.
  • Get somebody else to review my work before deleting.

The last idea reminds me of a quote in Kernighan’s famous book The Practice of Programming:

Another effective technique is to explain your code to someone else. This will often cause you to explain the bug to yourself. Sometimes it takes no more than a few sentences, followed by an embarrassed “Never mind, I see what’s wrong. Sorry to bother you.” This works remarkably well; you can even use non-programmers as listeners. One university computer center kept a teddy bear near the help desk. Students with mysterious bugs were required to explain them to the bear before they could speak to a human counselor.

I used that technique a lot at work. I ask somebody to “be my teddy bear”, tell them my problem, suddenly realize the solution, then thank them for their help even though they said nothing. Works every time!

Irony

Oh, here’s some irony. No sooner did I do the above, then I receive an email from Salesforce telling me that Recycle Bin limits are being cut:

Dear John,

At salesforce.com, Trust is our top priority, and it is our goal to improve the performance of our Recycle Bin functionality. With that in mind, we are making some changes to the Recycle Bin limits to provide you with a faster user experience.

What is the change and how does it impact me?
We are lowering the Recycle Bin retention period from 30 days to 15 days. The Recycle Bin link in the sidebar will now let you restore recently deleted records for 15 days before they are permanently deleted.

Additionally, we are lowering the Recycle Bin record limit from 250 times your storage to 25 times your storage. For example, if your organization has 1 GB of storage then your limit is 25 times 1000 MB or 25,000 records. If your organization reaches its Recycle Bin limit, Salesforce will automatically remove the oldest records if they have been in the Recycle Bin for at least two hours.

When is this change taking effect?
The lower Recycle Bin retention period will go into effect with the Winter ’12 Release.

The irony is that, had these reduced limits been in place, I would not have been able to recover my deleted data. Phew!

The Bottom Line

  • Test or verify before committing large data-related changes
  • You can’t do undelete via the Bulk API
  • The recycle bin is very big!
  • I’m cheap at heart

My users recently reported a strange problem with an S-Control. Yes, we’re actually still using this outdated capability of Salesforce.com. In fact, when I got to manage the S-Controls, this message appears:

So why are we still using S-Controls? Well, most because it still works but also because I haven’t quite figured out how to convert the functionality to VisualForce. My S-Control is being used to show related opportunities, very similar to the technique that I published on the Salesforce Developer Wiki back in 2008 regarding S-Controls and AJAX: Showing a pull-down list of related contacts.

Jon Mountjoy, godfather of the Salesforce developer community, even put a note on it in December 2009 to say that S-Controls are being deprecated. So, point taken!

Anyway, my users were reporting that the S-Control occasionally didn’t work. I couldn’t figure out the problem but, while checking the browser error log, I saw this message:

Uncaught {faultcode:’sf:REQUEST_LIMIT_EXCEEDED’, faultstring:’REQUEST_LIMIT_EXCEEDED: TotalRequests Limit exceeded.’, detail:{UnexpectedErrorFault:{exceptionCode:’REQUEST_LIMIT_EXCEEDED’, exceptionMessage:’TotalRequests Limit exceeded.’, }, }, }

A Google search led me to a Pervasive page that sent me to the official Salesforce documentation on API Usage Metering. A quick check on my organization showed that I had exceeded my allowable limit of 28,000 API calls in a 24-hour period.

Salesforce, in selling subscriptions to a multi-tenant system, have wisely included governor limits on various parts of the system to prevent particular customers over-using the system, thereby impacting performance for other users. And, it would seem, we exceeded our limit.

The limit, I should point out, is actually very generous — especially when we realised how we are entitled to 28,000 API calls. It consists of:

  • 1,000 calls per full Salesforce license (we have 8 users, so that’s 8 x 1,000 = 8,000)
  • 200 calls per free Force.com license (we have 100, so that’s 100 x 200 = 20,000)

Yes, most of our allowance actually comes from free licenses (see my earlier blog post on 100 Free Force.com licenses on your normal Salesforce account)!

According to my calculations, it would have required my users to display a page every 6 seconds to exceed this limit. This was unlikely to be the culprit. So, I had to consider what else was using the API calls and I realised that it was our automated Data Loader processes. We load information into our Salesforce instance automatically each day and hour. If our data volumes had increased, this could have led to an API overrun.

Our Data Loader imports are running in batches of 100 due to a problem with larger batch sizes (again, see my earlier blog post Beware Trigger batches over 100 records). This might have been fixed since then, but the system is still running in batches of 100. To exceed the API limit, this would have required 2,800,000 records to be loaded in 24 hours, which is amazingly high.

Alas, it was true. Due to a problem we had in our system, a lot of our customer data had been updated. This, in turn, triggered an update to Salesforce. (To be honest, I was the cause of the problem — see my post on Atlassian’s news blog, 40,000 apologies from Kitty and friends).

Net result… tens of thousands of records were loading into Salesforce and, due to the way we load incremental data, they were loaded several times. Thus, the culprit was found!

Sure enough, now that it has been more than 24 hours since the update, our API Usage count has been decreasing and things are back to normal.

I have also setup an “API Usage Notification” (accessed via Setup) so that I’ll receive an email if this happens again in future. Those Salesforce people think of everything!

The Bottom Line

  • Governor limits protect other users on Salesforce.com
  • They are a good way to stop things that seem to be going wrong
  • If we had governor limits on our systems, a lot of embarrassment could have been avoided, too!
  • I’m still not going to change my S-Control, so there!
November 19, 2010
Tags: Apex, Data Loader

I was in the audience at Dreamforce 2009 when Mark Benioff first demonstrated Chatter. Personally, I was non-plussed because my workplace keeps all its corporate knowledge on a Confluence wiki, which includes the ability to add comments on pages and track activities.

So, I never bothered delving into Chatter. I didn’t even turn it on. Nonetheless, the dear folks at Salesforce activated it by default. This resulted in some cute ‘feed’ emails and users started adding a picture to their profile.

After a while, the emails turned from ‘fun’ to ‘annoying’ because we have an automated load process that loads hundreds of records several times a day. So, I found the Customize/Chatter settings in Setup and turned off emails. All done and dusted, right? Wrong!

A week or so later, I get a call from my local Salesforce office. “Did you know that your storage has increased dramatically lately?”

No.

So I looked in Storage Usage and was flabbergastered to see this:

Apparently the system had created over 16 million Chatter “Feed Tracked Changes” records, occupying 4GB of storage. That’s quite impressive given that I’ve got a 1GB data quota!

So, I immediately turned off Chatter and waited for my Salesforce contact to get me something called “Chatter Janitor” that could help clean up the mess. In the meantime, I searched discussion boards for a solution, only to find that other people had the same problem and couldn’t figure out how to delete the records!

Attempt 1: Via System Log

Fortunately I came across a Purge Chatter Feed discussion on the Community forums. It shows how to delete feeds via the “Execute Apex” window of the System log. I’ve simplified their example to do the following:


OpportunityFeed[] feed = [Select Id from OpportunityFeed limit 10000];
delete feed;

Unfortunately, it didn’t work for me. Eventually I discovered that the Feed records are only available if Chatter is turned on, so I had to activate it again.

The above bit of code deletes 10,000 records at a time, which is the maximum allowable under the Governor Limits. Unfortunately, with my 16 million records, this would take 1600 executions of the code. That’s a lot of clicking!

I started doing it and things were moving pretty quickly, until my scheduled batch load activated and I ended up with even more Feed records than when I started. Arrgh!

Then, thanks to some hints from David Schach, I found that I could turn off Chatter on specific objects. I hadn’t realised this at first because the option only appears if Chatter is turned on!

Attempt 2: Anonymous Apex

Okay, hitting the “Execute” button in the System Log 1600 times didn’t sound like fun, so I thought I’d check how to automate it. I soon figured out how to call executeAnonymous via an API call. All I’d need to do is repeatedly call the above Apex code.

I used the SoapUI plugin for Intellij (you can also use it standalone or in Eclipse), which makes it very easy to create XML for SOAP calls and even automate them using a Groovy Script. This worked generally well, but could still only delete 10,000 records per SOAP call and it was taking a while to execute and occasionally timed-out. So, this wasn’t going to be the perfect solution, either.

Attempt 3: Batch Apex

I did some research and found that Batch Apex has no governor limits if envoked with a QueryLocator. Well, it actually has a 50 million record limit, but that was good enough for me!

The online documentation is pretty good and I eventually created this class:

global class ZapChatter implements Database.Batchable<sObject>{

  global ZapChatter() {
    System.Debug('In Zap');
  }

  global Database.QueryLocator start(Database.BatchableContext BC) {
    return Database.getQueryLocator('Select Id from OpportunityFeed limit 1000000');
  }

  global void execute(Database.BatchableContext BC, List<sObject> scope) {
    delete scope;
  }

  global void finish(Database.BatchableContext BC) {
  }

}

Batch Apex works by taking a large number of records and then processing them in ‘batches’ (obvious, eh?). So, in the above code, the start method selects 1 million records and then the execute method is called for every batch. It is invoked with:

id batchinstanceid = database.executeBatch(new ZapChatter(), 10000);

This can be executed in the System Log or via the Force.com IDE ‘Anonymous Apex’ facility. The “10000″ second parameter tells Batch Apex to use matches of 10,000 records.

Things worked, and didn’t work. I found that small batches of 200 or 1000 got executed very quickly. Batch sizes of 10,000 took a long time “in the queue”, taking 3 to 15 minutes between batches, probably a result of other workload in the system.

I then got greedy and tried one batch of 1 million records. This took 50 minutes to start the batch, only to fail with an error “Too many DML rows: 1000000“.

I then selected ALL 16 million records and requested a batch size of 10,000. This took 5 hours before it started the batches, with a total of 1674 batches required. I left it overnight but it didn’t run many batches, presumably because large batches are given low priority.

Attempt 4: Deleting via Data Loader

During all this fun, I lodged a Support Case with Salesforce to obtain their advice. They suggested using the Data Loader to export a list of Feed IDs and then load them in a Delete operation. I also discovered that Chatter has to be activated, otherwise Data Loader will not show the Feed objects (eg AccountFeed, OpportunityFeed).

This method did work. However, the speed of deletion was not great. I was only getting a rate of about 100,000 records per hour, probably due to my low upload bandwidth from home. (Yes, this had already occupied my work day, and was seeping into my evening, too!) At that rate, it would still take 160 hours to complete — that’s a full week!

What’s worse, the Data Loader normally works in batches of 200. This would require 80,000 API calls to delete the records, and we have a limit of 28,000 API calls per day. So, that’s 3 days minimum!

Attempt 5: Bulk API

Since I’m using all these new Apex technologies, I then thought I’d try the new Bulk API. It’s basically Data Loader on steroids — same interface, but with a few options turned on in the Settings dialog.

Bingo! Load speed went up dramatically. The Bulk API uses parallel processing and you can watch your job while it loads! In my case, it was loading 10 parallel batches, chewing up gobs of “API Active Processing time”. I upped my batch size to 10,000 so my test file of 100,000 records loaded in 10 batches. This handy, because there is a limit of 1,000 batches in a 24-hour period. So, 16 million records would use 1600 batches and would need to be spread across two days.

Since I’m in Australia and the speed of light somewhat impacts data transfers, I configured the Data Loader’s “Batch mode” to work from our data center in the USA. Attempting to extract 1 million records timed-out after 10 minutes before even starting the download, so I dropped down to 100,000 records with a maximum extractionReadSize of 2000 (which is the download batch size). This took 4½ minutes to run. The upload took only 6 seconds (wow!) and 7 minutes to run:

I then settled down, deleting in batches of 500,000. Success at last!

The Bottom Line

  • Chatter generates Feed objects when specified fields change
  • If you’re loading lots of records via the API, this might generate lots of Feed records
  • The feed records (eg AccountFeed, ContactFeed, OpportunityFeed) can consume a lot of space
  • If you want this to stop, turn off the individual Feeds (Setup, Customize, Chatter, Feed Tracking) but keep Chatter turned on for now
  • If you’ve only got few hundred thousand records, it’s easiest to delete them via the System Log
  • If you’ve got millions of records, use the Data Loader and Bulk API to extract then delete them
  • When you’re all done, turn off Chatter. Phew!
Tags: Data Loader, SQL

Oh boy, I love debugging!

And today I really had to debug DataLoader!

I do a lot of stuff with DataLoader — see my previous blogs on the topic. But I’ve always uploaded to Salesforce from a PostgreSQL database — never extracted.

DataLoader allows a job to be configured where a SOQL query is run, the data is extracted from Salesforce and the results are inserted into a database. The documentation is woeful on this whole topic, but fortunately DataLoader comes with sample process-conf.xml and database-conf.xml files that contain some examples.

The basic method is:

  • process-conf.xml defines the job and includes the SOQL in a extractionSOQL parameter
  • An SDL file defines the field mapping between Salesforce and the output
  • database-conf.xml defines the SQL statement to use, plus the parameter types

First hint: The Field names in the SDL file are case-sensitive. Use “ID” instead of “Id” and the field won’t come across.

Second hint: All fields come across in quotes. This totally destroys any attempt to load numbers!

Here’s part of the database-conf.xml that I used:

<bean id="extractMegaphoneQuery"
      class="com.salesforce.dataloader.dao.database.SqlConfig" singleton="true">
    <property name="sqlString">
        <value>
            INSERT INTO renewals_megaphone (
               period, megaphone_date, opportunity_name, reason_lost, owner, expiry_date, product, years_owned)
            VALUES (@period@, @megaphone_date@, @opportunity_name@, @reason_lost@, @owner@,
                    @expiry_date@, @product@, @years_owned@)
        </value>
    </property>
    <property name="sqlParams">
        <map>
            <entry key="period"           value="bigint"/>
            <entry key="megaphone_date"   value="java.sql.Date"/>
            <entry key="opportunity_name" value="java.lang.String"/>
            <entry key="reason_lost"      value="java.lang.String"/>
            <entry key="owner"            value="java.lang.String"/>
            <entry key="expiry_date"      value="java.sql.Date"/>
            <entry key="product"          value="java.lang.String"/>
            <entry key="years_owned"      value="integer"/>
        </map>
    </property>
</bean>

You’ll notice I have two fields that are numbers — period (bigint) and years_owned (integer). When trying to load these fields into my database, I kept getting the message:

Sql error: ERROR: column "period" is of type bigint but expression is of type character varying.
org.postgresql.util.PSQLException: ERROR: column "period" is of type bigint but expression is of type character varying

There was no way that I could convince DataLoader that the field was numeric — I tried lots of java.lang.Integer type entries in the sqlParams, but didn’t help.

I then changed the output to csvWrite instead of databaseWrite so that I could see the output, and it came through like this:

"PERIOD","MEGAPHONE_DATE","OPPORTUNITY_NAME","REASON_LOST","OWNER","EXPIRY_DATE","PRODUCT","YEARS_OWNED"
"83823.0","2010-06-08","Joe Smith","Invalid","John","2010-03-14","a0624004000qRtkgA2","3.0"

You’ll notice that the first and last columns are meant to be numbers, but they are coming through in quotation marks. That’s what was causing the database load to fail!

So, I updated my SQL to convert the strings to numbers:

    <property name="sqlString">
        <value>
            INSERT INTO renewals_megaphone (
               period, megaphone_date, opportunity_name, reason_lost, owner, expiry_date, product, years_owned)
            VALUES (@period@::numeric, @megaphone_date@, @opportunity_name@, @reason_lost@, @owner@, @expiry_date@, @product@, @years_owned@::numeric)
        </value>
    </property>

This successfully cast the strings into numbers and they finally got stored in the database (which automatically ignored the decimal places when converting to bigint and integer).

I hope the blog post saves other people similar problems!

The Bottom Line

  • DataLoader can extract data and insert it into a database
  • Numbers are quoted on export and cause database errors
  • I type-cast the string into numeric to enable the database to load it

I was configuring DataLoader to export a list of Opportunities, and I went to select the “Owner Name”. However, only OwnerID is available on an Opportunity.

“No problem!” I think to myself, as I go and create a Custom Field with a formula equal to Owner.Alias.

“What?” I say in surprise. “It won’t let me access a field on the Owner object!”

NoLink

Mmm. This is strange. Then a Google Search reveals an 18-month old Ideas request to Make “Owner Id” Look up fields available for formulas.

Oh dear.

Well, that’s a shame, but it’s easily solved! I created:

  • A field called Owner_Link__c of type Lookup(User)
  • A Trigger to copy OwnerId to Owner_Link__c when the Owner is changed
  • A test for the Trigger

Trigger:

trigger Update_OwnerLink_on_Owner_Update on Opportunity (before update, before insert) {

  // When 'Owner' field is changed, update 'OwnerLink' too

	// Loop through the incoming records
	for (Opportunity o : Trigger.new) {

		// Has Owner chagned?
		if (o.OwnerID != o.Owner_Link__c) {
			o.Owner_Link__c = o.OwnerId;
		}
	}
}

Test:

public with sharing class TriggerTest_OwnerLink {

	static TestMethod void testOwnerLink() {

		// Grab two Users
		User[] users = [select Id from User limit 2];
		User u1 = users[0];
		User u2 = users[1];

		// Create an Opportunity
		System.debug('Creating Opportunity');
		Opportunity o1 = new Opportunity(CloseDate = Date.newInstance(2008, 01, 01), Name = 'Test Opportunity', StageName = 'New', OwnerId = u1.Id);
		insert o1;

		// Test: Owner_Link should be set to user 1
		Opportunity o2 = [select id, OwnerId, Owner_Link__c from Opportunity where Id = :o1.Id];
		System.assertEquals(u1.Id, o2.OwnerId);
		System.assertEquals(u1.Id, o2.Owner_Link__c);

		// Modify Owner
		o2.OwnerId = u2.Id;
		update o2;

		// Test: Owner_Link should be set to user 2
		Opportunity o3 = [select id, OwnerId, Owner_Link__c from Opportunity where Id = :o2.Id];
		System.assertEquals(u2.Id, o3.OwnerId);
		System.assertEquals(u2.Id, o3.Owner_Link__c);
	}
}

This then gave me a new Owner object on my Opportunity on which I could create Formulas:

Result

I could also use it in the DataLoader by referring to Owner_Link__r.Alias.

Hooray!

Easy to solve, but it’s a shame it was necessary.

The Bottom Line

  • Formulas can’t access Opportunity.Owner fields
  • Create a ‘shadow’ field to hold Owner and populate it via a Trigger

Today I had to create campaigns to track Trade Show attendees. I’ve written previously about creating campaigns from Webex attendee files, but my Trade Show source files were not as friendly. Each Trade Show organiser gave us lists in different formats — the best one contained complete Name, Address, Company, Title and Email information, while the worst had just First Name and Email (yes, really!).

My requirements were:

  • Load information from disparate source documents
  • Create Contacts, not Leads (we don’t use Leads)
  • Don’t overwrite existing Contact records (existing customers are populated from an External system, with more accurate information than Trade Show visitor data)
  • Add them to a Campaign for tracking purposes

Importing Contacts

The first step was quite easy with the help of the Salesforce.com Data Loader (known as LexiLoader on the Mac). It can import Contact records from a CSV file. The trick here, however, was to use INSERT rather than UPSERT. My system is configured to use Email as an External ID for Contacts, so the import will fail for records where there is an existing Contact with the same Email address. This is good — it lets me create new records only where the Contact already does not exist.

Also as part of the import, I added a column for RecordTypeId. This is because I have two Record Types for my contacts:

  • Manually created contacts (which let users edit most fields) — this is the one I selected, and
  • Automatically created contacts (which come from my external system, and don’t let users edit fields since they are slaved from my external system)

I also added a column for AccountId, which is quite important. Any Contact records that aren’t linked to an Account are only visible to the record creator. In our case, we don’t actually use Accounts (I’ll blog about that in future), so I just linked all the new Contacts to a ‘Default Account’ to make them visible.

Adding Contacts to Campaigns

There’s lots of ways to add Campaign Members to Campaigns, either from Lead/Contact records or from the Campaign record. In this case, I wanted to add them in bulk, so I used the Manage Members button:

To load in bulk, I wanted to use an “Import File” capability, but I didn’t want to load Leads. So, the Add Members – Import File option was no good, since it only loads records as Leads. Instead, I used the Update Status – Import File option which can not only update a status, but also associate a Contact with a Campaign. In other words, it can take existing Contacts and add them to a Campaign — Perfect!

Unfortunately, Import File wants a file that contains the Salesforce Contact ID. This is a problem, since the ID is generated within the Force.com platform and is not in my CSV file. The online hints for that command suggest creating a Report on campaign members, saving as CSV then importing them to change the status. This is not an option for me since some of the people are already in Salesforce.com as existing Contacts and others I’ve just created. So, there’s no identifier that lets me select them all in a Report.

So, here’s a trick… After using DataLoader, two files are created. A success file contains a copy of all records that were successfully loaded, with a new ID column inserted. I can use this file with the Import command to update all the statuses. This takes care of all the Contacts that were created during the original Insert of Contacts. Fantastic!

But what about the Contacts that already existed, and therefore caused an error during the Insert? Well, the ID is actually supplied in the error message (eg duplicate value found: EmailAddr__c duplicates value on record with id: 0032000000Fs7Xp) and can be extracted using a RIGHT(B2,15) type of spreadsheet command. Just put that value in a new column, import the file and you’re done!

(To slightly simplify things, it’s also possible to load just one file — the original CSV file! Since all records now exist, every row will generate an error. You can then just process the error file rather than separately loading the success and error files.)


I’ll admit that during this process I stuffed up — I imported attendees to the wrong campaign. To remove them, I could have deleted the Campaign and started again. Instead, I used a favourite utility called Mass Delete Anything (there’s a Windows and a Mac version). I just created an appropriate SOQL command to identify the offending records, then Mass Deleted them. Very handy, but also very dangerous. Use with care on your Production systems.

The end result of all this… My Campaigns can now report counts of Total Contacts for the Trade Shows. When combined with a Trigger I’ve got that identifies people when they purchase products, I can track the conversion rate for Trade Show attendees, all without having to manually type any Contact information.

The Bottom Line

  • You can’t import Contacts directly into Campaigns, but you can import a list of existing Contacts to link to a Campaign
  • DataLoader saves the day again, especially with the information saved in its success and error output files
Tags: Data Loader

I love upserts. They are a combination of Inserts and Updates. The key to using Upsert is to nominate a field that contains an unique identifier for the record. While importing, if a record already exists with the same identifier, the record is Updated. If no such record exists, a new record is Inserted.

A good identifier is, of course, the Salesforce ID that is assigned to each record. However, there’s two significant problems in using the Salesforce ID:

  • It varies across instances (eg sandbox, production, developer account), which means that you can’t import the same IDs in different systems
  • It is generated by the Force.com platform — you can’t create it yourself

The Good News

Enter the External ID. This is a flag that can be set on Custom Fields that identifies a field as being an Unique Identifer, and allows that field to be used in Upserts.

My favourite use for an External ID is in storing the unique identifier from an External System (eg a customer database). If I load my customers into Salesforce.com, I can then use Upsert to reload them next time without creating duplicate records. The external system doesn’t even need to know that Salesforce exists, and certainly doesn’t have to store the Salesforce IDs.

The Bad News

I load a lot of customer information, and the best identifier for customers is their Email Address. Unfortunately, the Email field on Contact records is a Standard field and only Custom fields can be defined as an External ID.

To get around this limitation, I’ve created my own Email field (EmailAddr__c) that I define as an External ID. However, this leads to further problems:

  • Salesforce.com uses the normal “Email” field for sending emails (obvious, eh?)
  • Lots of existing lists, views and reports use the standard Email field
  • Things get very confusing when you have more than one ‘Email’ field

So, I created a Workflow that keeps the two fields in Sync. I import into the EmailAddr__c field and then get the workflow to update the stadnard Email field.

I also change the label of the standard Email field to “(Unused Email Field)” so that I don’t accidentally select it in Page Layouts.

What’s that you say? Why don’t I just use the normal Email field since the Workflow sets it to the same value? Well, the answer lies in the field type. The standard Email field is of type “Email”, which means that Salesforce.com performs validation to ensure the address ‘looks like’ an email field. However, my external system does not have similar checking. Therefore, I want to use the email address from my external system as an unique identifier even if it is a badly formatted address.

I’ve found that by importing directly into the Email field, the Data Loader will reject a record if it is a badly-formed email address, but copying it from my EmailAddr__c field into the Email field seems to avoid this checking. At times I also had the situation where EmailAddr__c had a value, but Email was blank. So, I find it a lot safer to use my custom email field, since I know it will always have a value. (We’ve since improved our external system to do similar validation on the email field, but we’ve still got lots of historical records with the problem.)

The Bottom Line

  • We can now use an Email Address as an External ID
  • We can now Upsert on Email Addresses
  • It’s a little messy, but it works
  • I wish Salesforce.com would let us define Standard fields as External IDs
Tags: Data Loader

I arrived at work today to get a mysterious message from one of my users. Opportunities, for some strange reason, were suddenly appearing in the system and being marked as already Won!

While this is great from a sales perspective (we all like Won opportunities!), it should not have been happening from a data perspective. Perhaps a bit of explanation is in order…

What it Does

My employer (a great Australia software company) sells software. Actually, since people can download and trial our software for free, it would be more accurate to say that we sell software licenses. Each year, we encourage our customers to renew their software maintenance so that they can get support and upgrades. So, our primary use for Salesforce.com is to track Renewal Opportunities.

These opportunities are perfect for automation. Our existing licensing system keeps track of customers, so I use the Salesforce.com Data Loader to import a list of products that have expired and we have dedicated Renewal Specialists who contact the customers to see if they would like to renew. We typically get around 70% renewing, so they must like our software! (Unlike other Enterprise Software companies, they can keep using our software even if they don’t pay annual maintenance.)

How it Works

The basic system is:

  • An SQL view on our customer database looks for expired licenses
  • The Data Loader uses this view to extract information from the database and load it directly into Salesforce as an Opportunity object
  • I use an Upsert operation with my licensing system supplying an External ID. This means that I can load new and existing opportunities in one batch, without my external system even having to know about Salesforce. As long as I supply the same External ID, Salesforce will automatically update existing records or create new ones if they don’t exist.
  • The data is reloaded overnight to show any information that has updated or any Opportunities that have closed. In fact, the only way to mark an Opportunity as Won is via the DataLoader — staff can’t claim a Closed/Won situation unless the external system says it has actually been sold.

So, why did we start getting Opportunities appearing long after expiry, when they had already been won?

Well, I tracked it down to the fact that I’m only creating Opportunities for customers in the Americas region, since that’s where our Renewal Specialist staff are located. It appears that some of our customers had incorrect address information so the system couldn’t understand their Country. My little SQL function that converts Country into Region didn’t realise that the Opportunities were in the Americas Region since they were missing the right country information.

Fortunately, we’ve had some staff manually cleaning our customer data. It just happened that they fixed the address information for a couple of customers who had expired licenses. This then caused the Opportunity to magically appear in Salesforce — later than should have happened. I managed to track this down from some ‘Modified date’ fields we’ve got on our data that I use for incremental loading (more on that topic another day). Plus, the automatic ‘Created by’ field on records gave great information on when records appeared.

The Bottom Line

  • Clean data is good
  • Automation is great, but can miss some things that humans would notice
  • Audit tracking of fields and records can help ‘piece together’ clues in mysterious situations