Groups | Blog | Home
all groups > dotnet internationalization > october 2004 >

dotnet internationalization : Removing BOF from a utf8 file


Jochen Kalmbach
10/2/2004 9:23:12 AM
[quoted text, click to view]

Open it in notepad and save it as "ANSI".

--
Greetings
Jochen

My blog about Win32 and .NET
Jochen Kalmbach
10/2/2004 10:07:56 AM
[quoted text, click to view]

Yes.

--
Greetings
Jochen

My blog about Win32 and .NET
Michael \(michka\) Kaplan [MS]
10/2/2004 11:11:58 AM
The BOM is not visible in Internet Explorer any time that either:

a) IE recognizes the file format (which is to say, usually), or

b) the code point is in the font as a ZERO WIDTH NO BREAK SPACE (which is
again to say, usually)

You can try right-clicking on the page and verifying the encoding in the
[unlikely] event that both (A) and (B) are not true.


--
MichKa [MS]
NLS Collation/Locale/Keyboard Development
Globalization Infrastructure and Font Technologies
Windows International Division

This posting is provided "AS IS" with
no warranties, and confers no rights.


[quoted text, click to view]

aa
10/2/2004 5:08:54 PM
I have a utf-8 PHP file handled with Notapad (w2k), which suddenly started
showing a gap at the top of the IE6 screen.
Viewing the source code via View-Source shows a square it the beginning of
the file, which, I guess, is BOM
How do I remove it?

aa
10/2/2004 5:58:14 PM
Open it in notepad and save it as "ANSI".

Then I will loose all the non-ANSI data?

Joerg Jooss
10/2/2004 9:10:46 PM

"aa" <aa@virgin.net> schrieb im Newsbeitrag
news:O9X$LnJqEHA.1576@TK2MSFTNGP12.phx.gbl...
[quoted text, click to view]

Certain editors like SciTE allow you to save UTF-8 files either with our
without BOM.

Cheers,

--
Joerg Jooss
joerg.jooss@gmx.net

aa
10/3/2004 11:27:27 AM
Thanks,
right-clicking --> encoding shows Unicode (UTF-8)
However the file in question is a PHP file wich includes another PHP UTF-8
files at the very begining ising the PHP operator include.
That second file, I guess, has its own BOF which is located somewhere after
the first BOF, which might render it visible in the browser
In a non-Unicode text editor this shows up as 

Some time ago I run across similat problem with ASP, but cannot remember how
I got round it.

"Michael (michka) Kaplan [MS]" <michkap@online.microsoft.com> wrote in
message news:%23o9vDtKqEHA.556@TK2MSFTNGP11.phx.gbl...
[quoted text, click to view]

Michael \(michka\) Kaplan [MS]
10/3/2004 6:37:59 PM
Those are the bytes of a BOM -- and what they would look like if it was not
detected as UTF-8 (which is not an issue in IE, by your own admission).

If you are combining files in something and no one is removing the
superfluous BOM then make sure you see it with a font that recognizes it is
a ZERO WIDTH NO BREAK SPACE. I know that it can read Unicode in UTF-8 (since
you claim there are many international characters in the file?).

In other words, everything you have discussed so far should have no problem.
Eventually you masy need to ask the question a more relevant forum for the
the responsible technology (PHP?).


--
MichKa [MS]
NLS Collation/Locale/Keyboard Technical Lead
Globalization Infrastructure and Font Technologies
Windows International Division

This posting is provided "AS IS" with
no warranties, and confers no rights.



[quoted text, click to view]

Jeremy Pullicino
10/11/2004 1:45:55 PM
Is the BOM the special hex numbers found at the begining of UTF-8 files when
saved with notepad?

Are UTF-8 text files with no BOM valid utf-8? If so, how can my application
detect that a file is in UTF-8 or ANSI?

Jeremy.

[quoted text, click to view]

Joerg Jooss
10/11/2004 10:25:15 PM
[quoted text, click to view]

Yes. Notepad always prepends the BOM.

[quoted text, click to view]

Yes. A UTF-8 BOM is optional.

[quoted text, click to view]

That's impossible. Even a BOM is a valid (though rather likely meaningless)
character sequence in ANSI (and the next question would be what's ANSI?
Windows 1252? Windows 1250?).

Cheers,

--
Joerg Jooss
www.joergjooss.de
news@joergjooss.de

AddThis Social Bookmark Button