AddThis Feed Button

Recent Items

November 19, 2010
Tags: Apex, Data Loader

I was in the audience at Dreamforce 2009 when Mark Benioff first demonstrated Chatter. Personally, I was non-plussed because my workplace keeps all its corporate knowledge on a Confluence wiki, which includes the ability to add comments on pages and track activities.

So, I never bothered delving into Chatter. I didn’t even turn it on. Nonetheless, the dear folks at Salesforce activated it by default. This resulted in some cute ‘feed’ emails and users started adding a picture to their profile.

After a while, the emails turned from ‘fun’ to ‘annoying’ because we have an automated load process that loads hundreds of records several times a day. So, I found the Customize/Chatter settings in Setup and turned off emails. All done and dusted, right? Wrong!

A week or so later, I get a call from my local Salesforce office. “Did you know that your storage has increased dramatically lately?”

No.

So I looked in Storage Usage and was flabbergastered to see this:

Apparently the system had created over 16 million Chatter “Feed Tracked Changes” records, occupying 4GB of storage. That’s quite impressive given that I’ve got a 1GB data quota!

So, I immediately turned off Chatter and waited for my Salesforce contact to get me something called “Chatter Janitor” that could help clean up the mess. In the meantime, I searched discussion boards for a solution, only to find that other people had the same problem and couldn’t figure out how to delete the records!

Attempt 1: Via System Log

Fortunately I came across a Purge Chatter Feed discussion on the Community forums. It shows how to delete feeds via the “Execute Apex” window of the System log. I’ve simplified their example to do the following:


OpportunityFeed[] feed = [Select Id from OpportunityFeed limit 10000];
delete feed;

Unfortunately, it didn’t work for me. Eventually I discovered that the Feed records are only available if Chatter is turned on, so I had to activate it again.

The above bit of code deletes 10,000 records at a time, which is the maximum allowable under the Governor Limits. Unfortunately, with my 16 million records, this would take 1600 executions of the code. That’s a lot of clicking!

I started doing it and things were moving pretty quickly, until my scheduled batch load activated and I ended up with even more Feed records than when I started. Arrgh!

Then, thanks to some hints from David Schach, I found that I could turn off Chatter on specific objects. I hadn’t realised this at first because the option only appears if Chatter is turned on!

Attempt 2: Anonymous Apex

Okay, hitting the “Execute” button in the System Log 1600 times didn’t sound like fun, so I thought I’d check how to automate it. I soon figured out how to call executeAnonymous via an API call. All I’d need to do is repeatedly call the above Apex code.

I used the SoapUI plugin for Intellij (you can also use it standalone or in Eclipse), which makes it very easy to create XML for SOAP calls and even automate them using a Groovy Script. This worked generally well, but could still only delete 10,000 records per SOAP call and it was taking a while to execute and occasionally timed-out. So, this wasn’t going to be the perfect solution, either.

Attempt 3: Batch Apex

I did some research and found that Batch Apex has no governor limits if envoked with a QueryLocator. Well, it actually has a 50 million record limit, but that was good enough for me!

The online documentation is pretty good and I eventually created this class:

global class ZapChatter implements Database.Batchable<sObject>{

  global ZapChatter() {
    System.Debug('In Zap');
  }

  global Database.QueryLocator start(Database.BatchableContext BC) {
    return Database.getQueryLocator('Select Id from OpportunityFeed limit 1000000');
  }

  global void execute(Database.BatchableContext BC, List<sObject> scope) {
    delete scope;
  }

  global void finish(Database.BatchableContext BC) {
  }

}

Batch Apex works by taking a large number of records and then processing them in ‘batches’ (obvious, eh?). So, in the above code, the start method selects 1 million records and then the execute method is called for every batch. It is invoked with:

id batchinstanceid = database.executeBatch(new ZapChatter(), 10000);

This can be executed in the System Log or via the Force.com IDE ‘Anonymous Apex’ facility. The “10000″ second parameter tells Batch Apex to use matches of 10,000 records.

Things worked, and didn’t work. I found that small batches of 200 or 1000 got executed very quickly. Batch sizes of 10,000 took a long time “in the queue”, taking 3 to 15 minutes between batches, probably a result of other workload in the system.

I then got greedy and tried one batch of 1 million records. This took 50 minutes to start the batch, only to fail with an error “Too many DML rows: 1000000“.

I then selected ALL 16 million records and requested a batch size of 10,000. This took 5 hours before it started the batches, with a total of 1674 batches required. I left it overnight but it didn’t run many batches, presumably because large batches are given low priority.

Attempt 4: Deleting via Data Loader

During all this fun, I lodged a Support Case with Salesforce to obtain their advice. They suggested using the Data Loader to export a list of Feed IDs and then load them in a Delete operation. I also discovered that Chatter has to be activated, otherwise Data Loader will not show the Feed objects (eg AccountFeed, OpportunityFeed).

This method did work. However, the speed of deletion was not great. I was only getting a rate of about 100,000 records per hour, probably due to my low upload bandwidth from home. (Yes, this had already occupied my work day, and was seeping into my evening, too!) At that rate, it would still take 160 hours to complete — that’s a full week!

What’s worse, the Data Loader normally works in batches of 200. This would require 80,000 API calls to delete the records, and we have a limit of 28,000 API calls per day. So, that’s 3 days minimum!

Attempt 5: Bulk API

Since I’m using all these new Apex technologies, I then thought I’d try the new Bulk API. It’s basically Data Loader on steroids — same interface, but with a few options turned on in the Settings dialog.

Bingo! Load speed went up dramatically. The Bulk API uses parallel processing and you can watch your job while it loads! In my case, it was loading 10 parallel batches, chewing up gobs of “API Active Processing time”. I upped my batch size to 10,000 so my test file of 100,000 records loaded in 10 batches. This handy, because there is a limit of 1,000 batches in a 24-hour period. So, 16 million records would use 1600 batches and would need to be spread across two days.

Since I’m in Australia and the speed of light somewhat impacts data transfers, I configured the Data Loader’s “Batch mode” to work from our data center in the USA. Attempting to extract 1 million records timed-out after 10 minutes before even starting the download, so I dropped down to 100,000 records with a maximum extractionReadSize of 2000 (which is the download batch size). This took 4½ minutes to run. The upload took only 6 seconds (wow!) and 7 minutes to run:

I then settled down, deleting in batches of 500,000. Success at last!

The Bottom Line

  • Chatter generates Feed objects when specified fields change
  • If you’re loading lots of records via the API, this might generate lots of Feed records
  • The feed records (eg AccountFeed, ContactFeed, OpportunityFeed) can consume a lot of space
  • If you want this to stop, turn off the individual Feeds (Setup, Customize, Chatter, Feed Tracking) but keep Chatter turned on for now
  • If you’ve only got few hundred thousand records, it’s easiest to delete them via the System Log
  • If you’ve got millions of records, use the Data Loader and Bulk API to extract then delete them
  • When you’re all done, turn off Chatter. Phew!

5 Responses to “Too much Chatter”

  1. Gregg Johnson Says:

    John – I’m in Product Management at Salesforce, work on Chatter, and saw your note. I wanted to apologize that you ended up in this situation – obviously given your explanation this wasn’t an ideal experience. We’ve recognized that a few customers like you are ending up in this situation when doing mass loads via the API. There are some solutions to this problem (for example, you can add some detail to a SOAP header that will “turn off” tracked changes for the operation you’re conducting), but these certainly aren’t as discoverable as we’d like — your experience is a testament to that.

    We’re working on a more scalable solution to this issue and will get you more information as soon as we can.

    Appreciate your business, as well as your patience on this particular issue.

  2. The Enforcer Says:

    Hi Gregg,

    No problems — to be honest, it was kind of fun! I would never have had a chance to play with Batch Apex or the Bulk API otherwise.

    In fact, I’m thinking moving a lot of my jobs to the Bulk API since it’s so much more efficient!

  3. Hannes Says:

    Sorry to hear you had so much trouble with this, but thanks for the write up. It was much fun to read :)

  4. Marty Y. Chang Says:

    I wonder if it would be possible to setup an Agent that runs daily to purge old Chatter records. To get around the Apex limits, maybe @future method calls can be used to delete records.

  5. CloudGofer Says:

    sorry to hear about the trouble you had, enjoyed reading well structured and step by step sequence of events (seemed like a good movie plot:).