Why is this so difficult? Log generation rates vary significantly based upon the configuration of devices. For example, you and I may both run Microsoft SQL Server databases, but I may have the logging and auditing settings configured to track almost every activity the database performs, while you may have minimal (or no) logging configured. Additionally, I may be in a high-load 24x7 data processing environment, while you may be running a database with low transaction volume. Therefore, it's impossible to provide a meaningful estimate of the log volume generated by a "typical" SQL Server database. Add in hundreds or thousands of other diverse devices, and the problem magnifies in scope quickly.
So how is it possible to develop a meaningful estimate for your environment? There's only one solution: measure your current activity by, for example, setting up a simple syslog server and measuring the volume of traffic it receives. If the systems are similarly configured, you can save time by measuring the logs generated by a representative sample of your organization's devices and extrapolate from there.
This was first published in February 2009