Archiving Old Data
Summary and Scope
This note is about how to use the new archive data feature in Observer GigaFlow.
Observer GigaFlow is designed to fill available storage space, i.e. data storage is based on storage capacity rather than time.
On a system with unlimited storage, all data would be retained forever.
In practice, the quantity of data stored in the working database is limited by the capacity of the disk drive where the database resides. Additionally, Observer GigaFlow can set aside space on the drive for health monitoring. We recommend that this is always enabled.
As the disk drive reaches capacity, Observer GigaFlow must remove data. This is done by age, i.e. the oldest data are removed first. Until recently, when flow data was removed from the database, it was simply deleted.
However, long term storage of flow data may be useful in some situations, e.g. to meet compliance objectives or to support investigations.
In the newest builds, we have introduced an archiving feature that allows indefinite storage of the most detailed Forensics data.
Storage Settings and the new Archiving Feature
You can find these settings in the application at System > Global > Storage Settings
In the Storage settings box, you can:
- Enable or disable drive space monitoring.
- Set the drive to monitor, e.g. C:/.
- Set the minimum free space allowed (GB).
- Set the default device storage space (GB).
- Set the minimum forensics storage (Days), e.g. 21. After this time, flowsec records will be deleted.
- Set the IP search storage (Days), e.g. 21. After this time, IP address history will be deleted.
- Set the forensics table cache size, e.g. 10,000. This is the number of entries cached before writing to disk.
- Set the forensic table cache age (milliseconds), e.g. 10,000. After 10 seconds, the forensic data is written to disk.
- Set the forensic cache storage size, e.g. 40,000.
- Enter forensics indexes. This is a comma-delimited list of forensics table field names, e.g. "srcadd,dstadd,appid". See Reports > Forensics in the Reference Manual for more.
- Set the forensic rollup age (Days), e.g. 4 days, period after which data should be rolled into daily tables.
- Set the event storage period (Days), e.g. 100 days, how long events should be recorded for.
- Set the ARP storage period (Days), e.g. 100 days, how long ARP entries should be recorded for.
- Set the CAM storage period (Days), e.g. 100 days, how long ARP entries should be recorded for.
- Set the event summary storage period (Days), e.g. 200 days.
- Set the interface summary storage period (Days), e.g. 200 days.
- Enable or disable Auto Tune of the Postgres database. Yes or No.
And, in the newest builds of Observer GigaFlow:
- Set archive folder location, default is c:\temp\
- Import existing archive(s).
- Enable archiving.
Data Retention and Rollup and the new Archiving Feature
You can find this information in the manual at System > Global > Data Retention and Rollup
The minimum Forensics data storage is 21 days. Forensics data is the lowest level flow data stored by Observer GigaFlow. Forensics data is stored in tables for up to four hours to speed up search and reporting. These tables are rolled into one-day tables after the Forensics Rollup Age period; this is four days by default. (see System > Global.)
With drive monitoring enabled, additional space is set aside. Observer GigaFlow will fill the disk drive(s) until the pre-defined minimum amount of free space is left. Observer GigaFlow caps the storage that any particular device is using for forensics data; this is 2 GB per device by default. This cap can be changed globally and on a per device basis which in turn sets an overall cap on the amount of space used by Observer GigaFlow.
|Type||Resolution||Table Duration (default)||Retention||Setting Involved|
|Raw Flows||millisecond||1-hour to 1-day||21 days||Min Free Space, Default Device Storage Space, Min Forensics Storage, Forensics Rollup Age.|
|IP Search||millisecond||1-day||21 days||IP Search Duration.|
|Events||millisecond||4-hour||100 days||Event Storage Period, Event Summary Storage Period.|
|ARP||millisecond||1-day||100 days||ARP Storage Period.|
|CAM||millisecond||1-day||100 days||CAM Storage Period.|
|Interface Summaries||Minute||2-day||200 days||Interface Summary Storage Period.|
|Traffic Summaries||Hour||7-day||200 days||Interface Summary Storage Period.|
With the new archiving feature enabled, old data can be exported to an archive before being removed from the working database, i.e. when device or drive storage limits have been reached.
Storage Notes and Requirements
For performance and stability, we recommend that the data archive is stored on a different disk drive to the main working database.
The archive location must have sufficient storage space.
Additionally, we recommend that the archive drive is backed-up periodically.
Each data export is a stored as a separate archive file.
The archive feature can be used together with, or separately from, other archive arrangements. For example, the Observer GigaFlow working database may be located on storage that supports 'snapshot' features. Finally, the archived flow data retains all 'enrichment' provided by Observer GigaFlow during its initial processing.
Importing Archive Data
Archive data can be reimported into Observer GigaFlow. Reinstated archive data is restored to the database and clearly labelled. Reinstated archive data is sticky, i.e. it is not automatically removed from the database and must be manually deleted.
For this reason, we recommend that a separate Observer GigaFlow instance is used to work with retrieved archive data.
In summary, the new archive feature provides an efficient way to store Observer GigaFlow data indefinitely.