Today I was trying to get a list of distinct values in a field in Salesforce having a certain criteria. If SOQL supported ‘select distinct’, this could be accomplished quite trivially. Of course you can’t do ‘select distinct’ with the Salesforce Object Query Language.
Connecting was a cinch, and issuing the command “select distinct whoid from task where field__c != null” got the job done. This command takes a considerable amount of time longer than just a select for a large set of records. But is it longer than performing your own de-duping process? It depends on the number of records and fields.
I queried an object that has 1.6 million rows. With my filter criteria on a cached set of data for this object, it took about 8 minutes to download 650k records with Data Loader with the query ‘select whoid from task where field__c != null’. By simply adding the ‘distinct’ keyword and pulling these records into a local file with SQLForce with the following command:
select distinct whoid from task where field__c != null output "distinct_whoid.csv";
The operation took 25 minutes and returned 165k records.