Hi, I have been working on creating relevance to determine whether a (.pst/.ost) file is in ANSI or UNICODE format. This is being done to determine if any “legacy format” Outlook files are on workstations before they are updated to 2013. This property does not appear in the file properties when you right click on it in Windows. I can’t rely on the file extention, because that does not appear to be a differentiating characteristic.
I was experimenting with this to try to differentiate between what i see in the file headers. However, this seems to be more difficult than I expected.
first 500 of lines of file “C:\Users<userid>\AppData\Local\Microsoft\Outlook<archive-name>.pst”
If I compare this between two .pst files where I believe one is UNICODE and one is ANSI, there doesn’t seem to be much difference between the two, at least not enough to base a format type classification on.
You might be able to do this using VBScript or CScript, which would mean that you do not need to install a program.
You might be able to do this using the relevance: ( byte of ) to read the raw bytes of a particular location of the file that may be indicative of the format.
I’m not very familiar with PST files, but If you can find documentation on how to tell the difference between the different types of PST files, then I may be able to write relevance against it. If you have example code in C# or something else that does this already, then the process it uses would be helpful to determine how to detect the difference.
I would recommend looking for something early on in a Unicode PST file that designates the format / version / encoding, or find something that is always Unicode within the beginning of the file.