We all know Facebook likes user data; it helps Facebook improve its product offerings and serve better targeted ads. Ever wonder just how much data Facebook collects? Would you believe me if I told you Facebook collects more than 500 TB — or 512,000 GB — of data daily? Yeah, it’s true.
In a short presentation today, Jay Parikh, Facebook’s VP of infrastructure engineering, revealed some stats on Facebook data collection and management. According to Parikh, everyday Facebook users:
- ‘Like’ 2.7 billion times
- Upload 300 million photos
- Share 2.5 billion content items
In total, Facebook collects 500+ TB of new data everyday. All this data is stored in a 100+ PT (petabytes, or 1,000 TB) Hadoop data cluster, which Facebook claims is larger than clusters by other companies (and I believe them). This Hadoop cluster runs 70,000 database queries each day and has the ability to scan 105 TB of data every 30 minutes, something Facebook employees do often to measure product performance.
Interestingly enough, Parikh mentions this data is not just for Facebook employees who are charged with selling advertisements; this data is made available to all Facebook product teams, regardless of if they are trying to sell ads or build new features, because the data Facebook collects is important towards improving Facebook services across the board. If this broad access makes you question the security of your data, Parikh says Facebook has a “zero-tolerance policy” for abuse of data and all data access is logged and monitored. Now isn’t that just sweet? Thanks Facebook!