How to estimate log generation rates
In this expert response, Mike Chapple explains why estimating log generation rates is so difficult.
Why is this so difficult? Log generation rates vary significantly based upon the configuration of devices. For example, you and I may both run Microsoft SQL Server databases, but I may have the logging and auditing settings configured to track almost every activity the database performs, while you may have minimal (or no) logging configured. Additionally, I may be in a high-load 24x7 data processing environment, while you may be running a database with low transaction volume. Therefore, it's impossible to provide a meaningful estimate of the log volume generated by a "typical" SQL Server database. Add in hundreds or thousands of other diverse devices, and the problem magnifies in scope quickly.
So how is it possible to develop a meaningful estimate for your environment? There's only one solution: measure your current activity by, for example, setting up a simple syslog server and measuring the volume of traffic it receives. If the systems are similarly configured, you can save time by measuring the logs generated by a representative sample of your organization's devices and extrapolate from there.
More information: