AddThis Feed Button

Recent Items

Tags: Atlassian

The incredible company where I work, Atlassian, has a Salesforce BA/Programmer role open at our San Francisco office. If you want to be our Salesforce Admin, check it out! (Disclaimer: If I refer you and you get hired, I get a reward.)

Oh, and if you refer somebody, you’ll get a reward!

While you’re at it, check out the Core Values video I made a few years ago, or my recent talk at the Atlassian Summit where I presented How we make our wiki sticky:


And take a look at a recent video about Atlassian’s market success.

Oh, and if you’re interested in what I get up to in my spare time, read about my recent cycling trip to China.

The Bottom Line

  • Come work at Atlassian!
  • I look great in lycra!

I was recently invited to be a technical reviewer of a book written specifically to help people study for the Salesforce.com Certified Force.com Developer exam. It’s written by Siddhesh Kabe who writes the May the Force be with you blog.

The book is called Force.com Developer Certification Handbook (DEV401) and is now available through Amazon in both paper and Kindle versions, online via Safari (read some of it online!) and also from the publisher themselves, Packt, in both dead tree and electronic formats.

Force.com Developer Certification Handbook (DEV401)

Fellow technical reviewers include Matthew Botos (he’s also a great photographer!) and Ankit Arora who write the ForceGuru blog, both of whom were recently appointed Force.com MVPs.

Here’s an overview of the book:

CHAPTER 1, Getting Started with Force.com, gives an overview of Force.com concepts and leads the reader through set-up of a free Developer Account.

CHAPTER 2, Creating a Database on Force.com, shows how to create objects and relationships within the Force.com data model.

CHAPTER 3, User Interface, explains how to create tabs, pages, page layouts and Visualforce elements. There’s a lot of knowledge crammed into this chapter, so don’t expect to understand it all at once.

CHAPTER 4, Implementing Business Logic, covers validation rules, formulas, roll-up summaries, workflow and approval processes in pretty good detail.

CHAPTER 5, Data Management, looks at inter-object relationships and the Data Loader, used for importing and exporting data.

CHAPTER 6, Analytic and Reporting, shows how to create Reports and Dashboards.

CHAPTER 7, Application Administration, explains access rights, roles, profiles and passwords.

CHAPTER 8 contains a 66-question practice exam, with solutions.

If you’re looking for good Force.com books, also see my review of Jason Ouellette’s book, which has now come out with an updated 2nd Edition.

The Bottom Line

  • It’s a new book to help with certification
  • Just reading the book isn’t enough to pass   you also need real-world hands-on experience
  • Have a brief read of it online
Tags: Apex, SOQL

I had to transfer opportunities and tasks between users today.

I noticed that the “Mass Transfer Records” option didn’t let me do it. I also found a couple of AppExchange apps that offered to do it, but I hate the hassle of installing foreign code into my own system.

So, I just did it myself via the System Log. I’m an old-fashioned sort of guy, so I even did it via the “old” version of the System Log. In case it’s useful for other people, here’s my code:

Opportunity[] opps = [select Id from Opportunity where OwnerId = '005200000010rF5' and StageName = 'Open'];
for (Opportunity o : opps) {
  o.OwnerId = '00520000003EtsZ';
}
update opps;

Of course, you’ll probably have different criteria than this, but it’s pretty straight-forward.

Here’s how I transferred opportunities that were ‘Not Started’:

Task[] tasks = [select Id from Task where OwnerId = '005200000010rF5' and Status = 'Not Started'];
for (Task t : tasks) {
  t.OwnerId = '00520000003EtsZ';
}
update tasks;

The Bottom Line

  • You can’t transfer Opportunities nor Tasks using in-built tools
  • You can do it easily via some quick Anonymous Apex in the System Log
  • I’m an old-fashioned kinda boy
Tags: Data Loader, SOQL

Our Salesforce instance loads quite a bit of data via the Data Loader (so much, that it sometimes screws things up). This is done in automated batches via the command line. Better yet, the information is extracted directly from our JDBC database — no disk files are involved! To top it off, it even runs on a Linux system (which was a little difficult because Data Loader is only distributed as a Windows executable!).

We also export some data from Salesforce, again inserting it directly into a database.

My challenge for today was to extract some information from the Opportunity Stage History (kept in the OpportunityHistory object). Specifically, I wanted to extract some Opportunity data together with the date that the Opportunity’s Stage was set to ‘Lost’. This required ‘joining’ between the Opportunity and OpportunityHistory objects.

I referred to the trusty SOQL Relationship Queries documentation and wound up writing a query on the OpportunityHistory object that also included data from the ‘parent’ Opportunity object:

select
  Opportunity.Reason_Lost__c, -- My custom fields
  Opportunity.Expiry_date__c,
  Opportunity.Product_Family__c,
  CreatedDate    -- Date on the Stage History record
from OpportunityHistory
where
  StageName = 'Lost'  -- On OpportunityHistory
  and Opportunity.Reason_lost__c != ''
  and Opportunity.CreatedDate >= 2009-11-01T00:00:00Z  -- I always hate having to write dates like this!

This very nicely returned me rows from Opportunity objects together with the date that the Stage was changed to ‘Lost’.

However, I had a lot of trouble getting it to load, and then I realised that my SDL file (used to map fields) also had to have the ‘Opportunity.‘ prefix!

In export_map_reason_lost.sdl
:

Opportunity.Reason_Lost__c=reason_lost
Opportunity.Product_Family__c=product
Opportunity.Expiry_Date__c=expiry_date
CreatedDate=lost_date

At this point, I should give a shout-out to my favourite Salesforce employee, Simon Fell (see my interview with him at Dreamforce 2009). He has written a great Mac utility called Soql Xplorer that makes writing SOQL a snap!

To assist any readers who are trying to nut-out how to use Data Loader to push/pull data to/from a database, here’s the relevant entries in my XML files:

In process-conf.xml:

    <bean id="extractReasonLost"
          class="com.salesforce.dataloader.process.ProcessRunner"
          singleton="false">
        <description>Extract Lost Opportunities to discover Reason Lost</description>
        <property name="name" value="extractReasonLostName"/>
        <property name="configOverrideMap">
            <map>
                <entry key="sfdc.endpoint" value="https://emea.salesforce.com"/>
                <entry key="sfdc.username" value="john.rotenstein@atlassian.com"/>
                <entry key="sfdc.password" value="secret"/>
                <entry key="sfdc.timeoutSecs" value="600"/>
                <entry key="sfdc.loadBatchSize" value="100"/>
                <entry key="sfdc.entity" value="Opportunity"/>
                <entry key="process.enableLastRunOutput" value="false" />
                <entry key="sfdc.extractionRequestSize" value="500"/>
                <entry key="sfdc.extractionSOQL" value="select Opportunity.Reason_Lost__c, Opportunity.Expiry_date__c,
                                 Opportunity.Product_Family__c, CreatedDate
                                 from OpportunityHistory where StageName = 'Lost' and Opportunity.Reason_lost__c != ''
                                 and Opportunity.CreatedDate >= 2009-11-01T00:00:00Z"/>
                <entry key="process.operation" value="extract"/>
                <entry key="process.mappingFile" value="export_map_reason_lost.sdl"/>
                <entry key="dataAccess.type" value="databaseWrite"/>
                <entry key="dataAccess.name" value="extractReasonLostBean"/>
            </map>
        </property>
    </bean>

In database-conf.xml:

<bean id="extractReasonLostBean"
      class="com.salesforce.dataloader.dao.database.DatabaseConfig"
      singleton="true">
    <property name="sqlConfig"  ref="extractReasonLostQuery"/>
    <property name="dataSource" ref="server_name"/>
</bean>

<bean id="extractReasonLostQuery"
      class="com.salesforce.dataloader.dao.database.SqlConfig" singleton="true">
    <property name="sqlString">
        <value>
            INSERT INTO renewals_lost (
               period, reason_lost, expiry_date, product, lost_date)
            VALUES (@period@::numeric, @reason_lost@, @expiry_date@, @product@, @lost_date@ )
        </value>
    </property>
    <property name="sqlParams">
        <map>
            <entry key="period"           value="integer"/>
            <entry key="reason_lost"      value="java.lang.String"/>
            <entry key="expiry_date"      value="java.sql.Date"/>
            <entry key="product"          value="java.lang.String"/>
            <entry key="lost_date"        value="java.sql.Date"/>
        </map>
    </property>
</bean>

The Bottom Line

  • Data Loader is very powerful for importing and exporting directly to/from a database
  • When extracting via SOQL that involves a relationship, include the fully qualified name in the SDL file (eg Opportunity.Stage)
  • Simon Fell is my hero!

I received a notice from my friendly Salesforce rep recently, advising that I had gone over my storage limit:

The last time I had heard from Salesforce on such a matter was when Chatter went wild and took me to 493% of my storage allocation! Oh, you’ll also notice from the picture in that article how much my ‘Contact’ record storage had grown over the past year!

This time, my rep kindly offered to raise an invoice for the additional storage space. I’m cheap at heart, so I decided instead to reduce my storage space. Not that I’m upset at Salesforce — I know it’s expensive to store data in their system because it’s all replicated between data centers, backed-up, etc. However, I knew that a lot of my data was unnecessary, and I could just dump it.

To explain, I populate my Salesforce instance from an external system. I had over 220,000 Contact records, of which only a subset were required. So, I decided to remove Contact records:

  • For people who don’t own any of our products (defined in a custom field)
  • For records with no Activities

So, I ran Data Loader (actually, the Mac version which is LexiLoader, compliments of Simon Fell, who reminds people to vote for his Idea that Salesforce produce an official Mac version) and extracted a list of contacts who don’t own a product.

I then ran another Data Loader extract to get a list of all Activity records.

Next, took the first list of contacts and subtracted any contacts associated with the Activity records. (I couldn’t figure out how to do this in one SOQL statement, suggestions welcome!)

Finally, I took the list of record IDs and asked the Data Loader to do a bulk delete of the records. It took my storage way down:

I must say, the bulk delete operation was extremely fast, since the Data Loader uses the Bulk API for such operations.

The ‘Oops!’ moment

Things seemed fine until a couple of days later when my users reported that they had records with Activities that had been deleted. I went back and checked my work, only to discover that I made an error in my “subtraction” step. Instead of taking all contacts and removed all IDs that matched a list of contacts that had Activities, I subtracted the list of Activities themselves. Since these objects have non-overlapping Ids (that is, no Activity IDs matched any Contact IDs), that operation did nothing.

End result: I deleted a lot of useful records. Gulp!

I did some searching and found rumors that Salesforce could undelete records, but charge a lot of money for the privilege. Not great, since it would cost more than I had originally tried to save!

Next, I investigated the Recycle Bin. Here’s what the official documentation says:

The Recycle Bin link in the sidebar lets you view and restore recently deleted records for 30 days before they are permanently deleted. Your recycle bin record limit is 250 times the Megabytes (MBs) in your storage. For example, if your organization has 1 GB of storage then your limit is 250 times 1000 MB or 250,000 records. If your organization reaches its Recycle Bin limit, Salesforce automatically removes the oldest records if they have been in the Recycle Bin for at least two hours.

My limit actually is 1GB (because we only have a small number of users, so we get the minimum size). Therefore, I get 250,000 records. Given that I deleted about 220,000 records, it means they’re all still in there!

I started to use the Recycle Bin ‘undelete’ function, but doing 200 at a time means I’d need to do it 1000 times!

So, I next tried some Apex in the System Log window, like this:

Contact[] c = [select id from contact where isDeleted = true LIMIT 1000 ALL ROWS];
undelete c;

However, some records didn’t want to undelete because our external system had already Upserted replacements and undeleting some records would have caused a clash of unique fields. And if this happened, the whole undelete was rolled-back rather than allowing through the non-clashing records. Argh! So, I then went to something a bit more sophisticated:

// Get a list of Contact records to delete
Contact[] contacts = [select id, EmailAddr__c from contact where isDeleted = true limit 1000 ALL ROWS ];

// Put the Email addresses into an array
String[] emails = new String[]{};
for (Contact c : contacts) {
  emails.add(c.EmailAddr__c);
}

// Get a list of 'alive' Contacts (not deleted) that already use that email address
Contact[] alive = [select id, EmailAddr__c from contact where EmailAddr__c in :emails];
system.debug('Found: ' + alive.size());

// Make a list of Contacts to delete
if (alive.size() != 0) {
  for (Contact c : alive) {
    for (Integer  i = 0; i < contacts.size(); ++i) {
      if (contacts[i].EmailAddr__c == c.EmailAddr__c) {
        contacts.remove(i);
        break;
      }
    }
  }
  system.debug('Will undelete: ' + contacts.size());

  // Delete them!
  undelete contacts;
}

I should explain the EmailAddr__c thing. You see, Email is my external ID. However, I couldn’t use the standard Email field as an External ID because I can’t force it to be unique. So, I have a second field for Email address and I populate the both. For more details, see my earlier blog post.

Anyway, the above code took about 2 minutes for 1000 records:

10:11:19.031 (31752000)|EXECUTION_STARTED
10:11:19.031 (31788000)|CODE_UNIT_STARTED|[EXTERNAL]|execute_anonymous_apex
10:11:19.032 (32365000)|SOQL_EXECUTE_BEGIN|[1]|Aggregations:0|select ...
10:11:19.074 (74698000)|SOQL_EXECUTE_END|[1]|Rows:1000
10:11:19.202 (202887000)|SOQL_EXECUTE_BEGIN|[6]|Aggregations:0|select ...
10:13:07.266 (108266842000)|SOQL_EXECUTE_END|[6]|Rows:157
10:13:07.267 (108267315000)|USER_DEBUG|[7]|DEBUG|Found: 157
10:13:15.949 (116949306000)|USER_DEBUG|[19]|DEBUG|Will delete: 896
10:13:15.950 (116950156000)|DML_BEGIN|[20]|Op:Undelete|Type:Contact|Rows:896
10:13:19.937 (120937987000)|DML_END|[20]

Most of the time taken was for the 2nd SOQL query (106 seconds), which matches on email. The loop to eliminate duplicates also took time (8 seconds). The undelete itself was relatively quick (4 seconds).

So, I included an ORDER BY clause in my initial query that tried older records first. This resulted in less email clashes, and much faster execution times.

Over the course of a day, I managed to undelete all the records. In fact, it sped up a lot after midnight San Francisco time (which is easy for me because I’m in Australia). Finally, I did my mass delete properly and everybody was happy.

The result:

How to avoid this error in future

Okay, I was doing dangerous stuff and I did it wrong. So how could I avoid this in future? Some ideas:

  • Make a backup first! Extract all data first (but that’s not easy!) or use the “Export Data” function (but that’s not easy to reload).
  • Try it in the Sandbox first. However, we have a Cofiguration-only Sandbox, without all the data. No good.
  • Test before committing the delete. I did pick random records, but obviously not enough.
  • Get somebody else to review my work before deleting.

The last idea reminds me of a quote in Kernighan’s famous book The Practice of Programming:

Another effective technique is to explain your code to someone else. This will often cause you to explain the bug to yourself. Sometimes it takes no more than a few sentences, followed by an embarrassed “Never mind, I see what’s wrong. Sorry to bother you.” This works remarkably well; you can even use non-programmers as listeners. One university computer center kept a teddy bear near the help desk. Students with mysterious bugs were required to explain them to the bear before they could speak to a human counselor.

I used that technique a lot at work. I ask somebody to “be my teddy bear”, tell them my problem, suddenly realize the solution, then thank them for their help even though they said nothing. Works every time!

Irony

Oh, here’s some irony. No sooner did I do the above, then I receive an email from Salesforce telling me that Recycle Bin limits are being cut:

Dear John,

At salesforce.com, Trust is our top priority, and it is our goal to improve the performance of our Recycle Bin functionality. With that in mind, we are making some changes to the Recycle Bin limits to provide you with a faster user experience.

What is the change and how does it impact me?
We are lowering the Recycle Bin retention period from 30 days to 15 days. The Recycle Bin link in the sidebar will now let you restore recently deleted records for 15 days before they are permanently deleted.

Additionally, we are lowering the Recycle Bin record limit from 250 times your storage to 25 times your storage. For example, if your organization has 1 GB of storage then your limit is 25 times 1000 MB or 25,000 records. If your organization reaches its Recycle Bin limit, Salesforce will automatically remove the oldest records if they have been in the Recycle Bin for at least two hours.

When is this change taking effect?
The lower Recycle Bin retention period will go into effect with the Winter ’12 Release.

The irony is that, had these reduced limits been in place, I would not have been able to recover my deleted data. Phew!

The Bottom Line

  • Test or verify before committing large data-related changes
  • You can’t do undelete via the Bulk API
  • The recycle bin is very big!
  • I’m cheap at heart

Okay, this post isn’t exactly about Force.com, but it is cloud-related.

After some extremely heavy rain here in Sydney (wettest July ever recorded), a friend asked me about online backup options. It seems that the threat of flooding led him to think about protecting his data. At least he thought about it before he lost any data!

As it happens, I had recently researched this topic for my own backups. I always do nightly backups to a second hard disk, but I had wanted some extra protection against theft or fire, so uploading to the Internet sounded like a good option. I looked at various online backup providers like Mozy and Jungle Disk but noticed that services like DropBox were also promoting themselves as a form of backup. I found this interesting, because I see DropBox as a synchronization tool rather than a backup tool, but I investigated the option nonetheless.

What I found changed the way I view backups. Rather than thinking of backups as batch copies with full/incremental options, I came to appreciate that synchronization tools actually have advantages over traditional backup methods:

  • They work instantly, rather than in batches
  • The concept of full vs incremental backups is no longer relevant since they’re always updating incrementally
  • They can handle multiple computers
  • Web access is extremely useful

My selection set quickly came down to DropBox vs SugarSync, which are both “sync to the cloud” tools.

DropBox

I love DropBox and it seems that everyone at Atlassian (my workplace) loves it too. It’s great for sychronizing files between computers. I have used it for years, primarily to move files between home and work. I just drop a file into the DropBox folder and it automatically copies up to the cloud. It automatically syncs to my other computer (or, if the other computer is off, it does so as soon as it is turned on).

I love playing games with this technology. If I have two such computers in one room (eg a laptop and desktop), I delete a file on one computer and see how long it takes to delete on the other (Answer: it’s effectively instant). I’ve also deleted files via the provided web interface, which instantly deletes the file from the computer I’m using. DropBox also keeps copies of deleted files for 30 days for recovery.

DropBox is renowned for its simplicity. For example, it simply syncs anything put into the DropBox folder. However, this simplicity made DropBox less-than-perfect for my backup requirements. I’d like to specify folders anywhere on my computer that I’d like copied — my documents, my photos, my work notes. They’re all in different places and I didn’t want to change my work habits and move them all into a single DropBox directory.

SugarSync

I also came across SugarSync, which is very similar to DropBox. There’s probably many differences, but the  most significant for me were:

  • The ability to specify any folders to be synchronized
  • The ability to sync multiple computers independently
  • Not having to sync to other computers

The folder bit is self-explanatory. I can select any folders/subfolders and tell SugarSync to sync them to the cloud.

The multi-computer aspect turned out to be most handy because I can sync my home computer to the ‘cloud’ and I can also sync my work computer to the cloud. The files are kept separately and are identified by the computer from whence they came.

Here’s an example… If I’m at home and I want a file from my work computer, I can access the SugarSync web interface and download the file. I don’t need to sync whole folders from one computer to the other. In fact, I specifically don’t want this to happen — I don’t want my personal home files copied to my work computer (which administrators can access) and I don’t want all my work files copied to my home (because they’re mostly irrelevant for my home life). However, I do want the ability to access the files if necessary.

Also, the storage space purchased on SugarSync is shared between all computer uploads, without needing separate accounts for each computer.

The Choice

I decided to go with SugarSync due to the above differences, but also because I got a limited-time offer that gave a 50% discount (yes, pricing matters!). I received that offer by simply being a user of the free 5GB plan, so I’d at least recommend signing-up for that to get any future special offers.

I’m very happy with the way it has worked out. It is rare that I have to access files from other computers, but it does happen once or twice a month. I also enjoy the fact that I can access my photos via web browser, which is very handy when I’m visiting friends and want to show off. (Both DropBox and SugarSync have the ability to view photos via the web and to share photo folders with other people.)

I enjoy the piece-of-mind whenever I save a file and see the ‘sync’ icon activate, which tells me the files are already being copied elsewhere.

So, I would highly recommend using sync tools as a backup system, since they have several benefits over traditional backup. They shouldn’t be your only form of backup — I would still recommend a local hard disk backup, which has the advantage of being able to restore a computer to bootable state if something goes wrong (which the sync tools can’t do).

If you’re going to look at the above tools, feel free to use the links below. They’ll give you extra benefits and will give me a reward, too (full disclosure!). I guess it’s cheaper than advertising!

DropBox: Free 2GB DropBox account, plus bonus 250MB via this link

SugarSync: Free 5GB account, plus 500MB extra, or 10GB extra on a paid account

Oh, and remember — paper is a perfectly good form of backup, especially for short documents. It’s hard to erase and survives for years!

If you’ve found other good backup options, feel free to leave a comment!

The Bottom Line

  • Backups don’t have to be done in batches
  • Sync tools offer benefits over traditional backup techniques
  • Always have multiple forms of backup

My users recently reported a strange problem with an S-Control. Yes, we’re actually still using this outdated capability of Salesforce.com. In fact, when I got to manage the S-Controls, this message appears:

So why are we still using S-Controls? Well, most because it still works but also because I haven’t quite figured out how to convert the functionality to VisualForce. My S-Control is being used to show related opportunities, very similar to the technique that I published on the Salesforce Developer Wiki back in 2008 regarding S-Controls and AJAX: Showing a pull-down list of related contacts.

Jon Mountjoy, godfather of the Salesforce developer community, even put a note on it in December 2009 to say that S-Controls are being deprecated. So, point taken!

Anyway, my users were reporting that the S-Control occasionally didn’t work. I couldn’t figure out the problem but, while checking the browser error log, I saw this message:

Uncaught {faultcode:’sf:REQUEST_LIMIT_EXCEEDED’, faultstring:’REQUEST_LIMIT_EXCEEDED: TotalRequests Limit exceeded.’, detail:{UnexpectedErrorFault:{exceptionCode:’REQUEST_LIMIT_EXCEEDED’, exceptionMessage:’TotalRequests Limit exceeded.’, }, }, }

A Google search led me to a Pervasive page that sent me to the official Salesforce documentation on API Usage Metering. A quick check on my organization showed that I had exceeded my allowable limit of 28,000 API calls in a 24-hour period.

Salesforce, in selling subscriptions to a multi-tenant system, have wisely included governor limits on various parts of the system to prevent particular customers over-using the system, thereby impacting performance for other users. And, it would seem, we exceeded our limit.

The limit, I should point out, is actually very generous — especially when we realised how we are entitled to 28,000 API calls. It consists of:

  • 1,000 calls per full Salesforce license (we have 8 users, so that’s 8 x 1,000 = 8,000)
  • 200 calls per free Force.com license (we have 100, so that’s 100 x 200 = 20,000)

Yes, most of our allowance actually comes from free licenses (see my earlier blog post on 100 Free Force.com licenses on your normal Salesforce account)!

According to my calculations, it would have required my users to display a page every 6 seconds to exceed this limit. This was unlikely to be the culprit. So, I had to consider what else was using the API calls and I realised that it was our automated Data Loader processes. We load information into our Salesforce instance automatically each day and hour. If our data volumes had increased, this could have led to an API overrun.

Our Data Loader imports are running in batches of 100 due to a problem with larger batch sizes (again, see my earlier blog post Beware Trigger batches over 100 records). This might have been fixed since then, but the system is still running in batches of 100. To exceed the API limit, this would have required 2,800,000 records to be loaded in 24 hours, which is amazingly high.

Alas, it was true. Due to a problem we had in our system, a lot of our customer data had been updated. This, in turn, triggered an update to Salesforce. (To be honest, I was the cause of the problem — see my post on Atlassian’s news blog, 40,000 apologies from Kitty and friends).

Net result… tens of thousands of records were loading into Salesforce and, due to the way we load incremental data, they were loaded several times. Thus, the culprit was found!

Sure enough, now that it has been more than 24 hours since the update, our API Usage count has been decreasing and things are back to normal.

I have also setup an “API Usage Notification” (accessed via Setup) so that I’ll receive an email if this happens again in future. Those Salesforce people think of everything!

The Bottom Line

  • Governor limits protect other users on Salesforce.com
  • They are a good way to stop things that seem to be going wrong
  • If we had governor limits on our systems, a lot of embarrassment could have been avoided, too!
  • I’m still not going to change my S-Control, so there!
Tags: Atlassian

I spend my working life at Atlassian, an Australian software development firm that’s building a reputation for its products and work culture.

If you’re interested in learning more, take a look at my entry into the Management 2.0 M-Prize: It’s the Culture, Stupid! How Atlassian maintains an open Information Culture.

It’s a competition, so feel free to rate it!

The topic relates to a presentation I gave at the recent Atlassian Summit in June 2011.

Here’s a video of my presentation. If you watch long enough, you’ll see my strip off T-Shirts in choreographed musical routines. Sunglasses recommended. :)

Living in a Microsoft-Free World: Information Management at Atlassian

The Bottom Line

  • Atlassian’s a cool place to work (we’re hiring!)
  • Don’t let me get in front of an audience
November 19, 2010
Tags: Apex, Data Loader

I was in the audience at Dreamforce 2009 when Mark Benioff first demonstrated Chatter. Personally, I was non-plussed because my workplace keeps all its corporate knowledge on a Confluence wiki, which includes the ability to add comments on pages and track activities.

So, I never bothered delving into Chatter. I didn’t even turn it on. Nonetheless, the dear folks at Salesforce activated it by default. This resulted in some cute ‘feed’ emails and users started adding a picture to their profile.

After a while, the emails turned from ‘fun’ to ‘annoying’ because we have an automated load process that loads hundreds of records several times a day. So, I found the Customize/Chatter settings in Setup and turned off emails. All done and dusted, right? Wrong!

A week or so later, I get a call from my local Salesforce office. “Did you know that your storage has increased dramatically lately?”

No.

So I looked in Storage Usage and was flabbergastered to see this:

Apparently the system had created over 16 million Chatter “Feed Tracked Changes” records, occupying 4GB of storage. That’s quite impressive given that I’ve got a 1GB data quota!

So, I immediately turned off Chatter and waited for my Salesforce contact to get me something called “Chatter Janitor” that could help clean up the mess. In the meantime, I searched discussion boards for a solution, only to find that other people had the same problem and couldn’t figure out how to delete the records!

Attempt 1: Via System Log

Fortunately I came across a Purge Chatter Feed discussion on the Community forums. It shows how to delete feeds via the “Execute Apex” window of the System log. I’ve simplified their example to do the following:


OpportunityFeed[] feed = [Select Id from OpportunityFeed limit 10000];
delete feed;

Unfortunately, it didn’t work for me. Eventually I discovered that the Feed records are only available if Chatter is turned on, so I had to activate it again.

The above bit of code deletes 10,000 records at a time, which is the maximum allowable under the Governor Limits. Unfortunately, with my 16 million records, this would take 1600 executions of the code. That’s a lot of clicking!

I started doing it and things were moving pretty quickly, until my scheduled batch load activated and I ended up with even more Feed records than when I started. Arrgh!

Then, thanks to some hints from David Schach, I found that I could turn off Chatter on specific objects. I hadn’t realised this at first because the option only appears if Chatter is turned on!

Attempt 2: Anonymous Apex

Okay, hitting the “Execute” button in the System Log 1600 times didn’t sound like fun, so I thought I’d check how to automate it. I soon figured out how to call executeAnonymous via an API call. All I’d need to do is repeatedly call the above Apex code.

I used the SoapUI plugin for Intellij (you can also use it standalone or in Eclipse), which makes it very easy to create XML for SOAP calls and even automate them using a Groovy Script. This worked generally well, but could still only delete 10,000 records per SOAP call and it was taking a while to execute and occasionally timed-out. So, this wasn’t going to be the perfect solution, either.

Attempt 3: Batch Apex

I did some research and found that Batch Apex has no governor limits if envoked with a QueryLocator. Well, it actually has a 50 million record limit, but that was good enough for me!

The online documentation is pretty good and I eventually created this class:

global class ZapChatter implements Database.Batchable<sObject>{

  global ZapChatter() {
    System.Debug('In Zap');
  }

  global Database.QueryLocator start(Database.BatchableContext BC) {
    return Database.getQueryLocator('Select Id from OpportunityFeed limit 1000000');
  }

  global void execute(Database.BatchableContext BC, List<sObject> scope) {
    delete scope;
  }

  global void finish(Database.BatchableContext BC) {
  }

}

Batch Apex works by taking a large number of records and then processing them in ‘batches’ (obvious, eh?). So, in the above code, the start method selects 1 million records and then the execute method is called for every batch. It is invoked with:

id batchinstanceid = database.executeBatch(new ZapChatter(), 10000);

This can be executed in the System Log or via the Force.com IDE ‘Anonymous Apex’ facility. The “10000″ second parameter tells Batch Apex to use matches of 10,000 records.

Things worked, and didn’t work. I found that small batches of 200 or 1000 got executed very quickly. Batch sizes of 10,000 took a long time “in the queue”, taking 3 to 15 minutes between batches, probably a result of other workload in the system.

I then got greedy and tried one batch of 1 million records. This took 50 minutes to start the batch, only to fail with an error “Too many DML rows: 1000000“.

I then selected ALL 16 million records and requested a batch size of 10,000. This took 5 hours before it started the batches, with a total of 1674 batches required. I left it overnight but it didn’t run many batches, presumably because large batches are given low priority.

Attempt 4: Deleting via Data Loader

During all this fun, I lodged a Support Case with Salesforce to obtain their advice. They suggested using the Data Loader to export a list of Feed IDs and then load them in a Delete operation. I also discovered that Chatter has to be activated, otherwise Data Loader will not show the Feed objects (eg AccountFeed, OpportunityFeed).

This method did work. However, the speed of deletion was not great. I was only getting a rate of about 100,000 records per hour, probably due to my low upload bandwidth from home. (Yes, this had already occupied my work day, and was seeping into my evening, too!) At that rate, it would still take 160 hours to complete — that’s a full week!

What’s worse, the Data Loader normally works in batches of 200. This would require 80,000 API calls to delete the records, and we have a limit of 28,000 API calls per day. So, that’s 3 days minimum!

Attempt 5: Bulk API

Since I’m using all these new Apex technologies, I then thought I’d try the new Bulk API. It’s basically Data Loader on steroids — same interface, but with a few options turned on in the Settings dialog.

Bingo! Load speed went up dramatically. The Bulk API uses parallel processing and you can watch your job while it loads! In my case, it was loading 10 parallel batches, chewing up gobs of “API Active Processing time”. I upped my batch size to 10,000 so my test file of 100,000 records loaded in 10 batches. This handy, because there is a limit of 1,000 batches in a 24-hour period. So, 16 million records would use 1600 batches and would need to be spread across two days.

Since I’m in Australia and the speed of light somewhat impacts data transfers, I configured the Data Loader’s “Batch mode” to work from our data center in the USA. Attempting to extract 1 million records timed-out after 10 minutes before even starting the download, so I dropped down to 100,000 records with a maximum extractionReadSize of 2000 (which is the download batch size). This took 4½ minutes to run. The upload took only 6 seconds (wow!) and 7 minutes to run:

I then settled down, deleting in batches of 500,000. Success at last!

The Bottom Line

  • Chatter generates Feed objects when specified fields change
  • If you’re loading lots of records via the API, this might generate lots of Feed records
  • The feed records (eg AccountFeed, ContactFeed, OpportunityFeed) can consume a lot of space
  • If you want this to stop, turn off the individual Feeds (Setup, Customize, Chatter, Feed Tracking) but keep Chatter turned on for now
  • If you’ve only got few hundred thousand records, it’s easiest to delete them via the System Log
  • If you’ve got millions of records, use the Data Loader and Bulk API to extract then delete them
  • When you’re all done, turn off Chatter. Phew!
Tags: people

Back in 2000, I joined a dot-com darling called Ariba. Well, their success faded pretty quickly and I often wondered if I should have taken my other job offer, from Siebel (now a part of Oracle and a competitor to Salesforce.com). One measure of relative career goodness I use is the number of job ads available for a particular technology, and Siebel always had more job ads than Ariba.

So, how handy is it to have Salesforce.com experience on your resume?

Well, a quick look at Craig’s List didn’t reveal too many openings, but I did receive an email from Valerie Day who works in ‘Talent Acquisition’ for Deloitte. They have quite a few listings for Salesforce Consultants and Managers.

I asked Valerie how Salesforce Consulting differs from other types of consulting. Here’s what she had to say:

“Salesforce provides the ability to move to a more virtual model of remote consulting which saves money, improves efficiency by reducing time wasted on travel as well as improving the life expectancy of a consultant by providing a better work/life balance.

“Implementing SFDC follows a more iterative approach to consulting as opposed to longer more drawn out waterfall consulting approach which is more common with on-premise applications such as SAP, Oracle, Siebel. This leads to faster implementations, real time development and a more interactive experience for the client.

“Because of the easy to use interface, it does allow for functional folks to develop code where as with other systems this work could only be achieved by dedicated technical resources.”

    I certainly agree with that last statement. I remember the first time I met with a Salesforce sales person. They were able to answer all the technical questions themselves, which is not the norm with other technologies.

    I also agree that implementations are quicker because you can prototype solutions immediately, rather than having to build the basic technology first. Not good for earning heaps of revenue for a consulting firm, but I guess that’s their problem!

    For those people interested, Valerie can be contacted at vday@ (followed by the obvious domain name).

    The Bottom Line

    • Salesforce-skilled people are in demand out there
    • Salesforce implementations tend to be shorter than the traditional on-premise applications