Discussion:
[gambit-list] Making UTF-8 the default for Gambit
Marc Feeley
2017-08-15 14:15:01 UTC
Permalink
Do you want:

1) the compiler to accept source code in UTF-8 character encoding?
2) compiled programs to read and write characters using UTF-8 character encoding?

#1 can be done with “gsc -:f8 …”
#2 can be done by starting your program with the shebang “#! /usr/bin/env gsi -:t8,f8,-8”.

A configure option could be added to set the default character encoding.

Should the system obey the LC_CTYPE or LC_ALL variable? Good question! Is this “expected” behavior? It wouldn’t be hard to implement, and still allow overriding with the runtime options. However, gsc and gsi will become more fragile and dependent on the run time environment…

What’s best?

Marc
Is it possible to build Gambit such that UTF-8 is the default for I/O?
Or should it heed the LC_CTYPE or LC_ALL variable?
I had a bug while compiling gerbil code that vyzo traced to
disagreement between I/O options used by gambit and gerbil. The
workaround was to export GAMBOPT=t8,f8,-8 rather than fight to
propagate the proper options to gsc. I would like to build gambit with
these options as the default, so I can distribute a version that works
out of the box.
—♯ƒ • François-René ÐVB Rideau •Reflection&Cybernethics• http://fare.tunes.org
He wa'n't no common dog, he wa'n't no mongrel; he was a composite.
A composite dog is a dog that is made up of all the valuable qualities
that's in the dog breed — kind of a syndicate; and a mongrel is made up
of all riffraff that's left over. — Mark Twain
_______________________________________________
Gambit-list mailing list
https://webmail.iro.umontreal.ca/mailman/listinfo/gambit-list
Dimitris Vyzovitis
2017-08-15 14:37:01 UTC
Permalink
I think we just want a configure option that makes -:t8,f8,-8 the default
to start with.


-- vyzo
Post by Marc Feeley
1) the compiler to accept source code in UTF-8 character encoding?
2) compiled programs to read and write characters using UTF-8 character encoding?
#1 can be done with “gsc -:f8 
”
#2 can be done by starting your program with the shebang “#! /usr/bin/env
gsi -:t8,f8,-8”.
A configure option could be added to set the default character encoding.
Should the system obey the LC_CTYPE or LC_ALL variable? Good question!
Is this “expected” behavior? It wouldn’t be hard to implement, and still
allow overriding with the runtime options. However, gsc and gsi will
become more fragile and dependent on the run time environment

What’s best?
Marc
Is it possible to build Gambit such that UTF-8 is the default for I/O?
Or should it heed the LC_CTYPE or LC_ALL variable?
I had a bug while compiling gerbil code that vyzo traced to
disagreement between I/O options used by gambit and gerbil. The
workaround was to export GAMBOPT=t8,f8,-8 rather than fight to
propagate the proper options to gsc. I would like to build gambit with
these options as the default, so I can distribute a version that works
out of the box.
—♯ƒ • François-René ÐVB Rideau •Reflection&Cybernethics•
http://fare.tunes.org
He wa'n't no common dog, he wa'n't no mongrel; he was a composite.
A composite dog is a dog that is made up of all the valuable qualities
that's in the dog breed — kind of a syndicate; and a mongrel is made up
of all riffraff that's left over. — Mark Twain
_______________________________________________
Gambit-list mailing list
https://webmail.iro.umontreal.ca/mailman/listinfo/gambit-list
_______________________________________________
Gambit-list mailing list
https://webmail.iro.umontreal.ca/mailman/listinfo/gambit-list
Dimitris Vyzovitis
2017-08-15 14:39:18 UTC
Permalink
Obviously this applies to all encodings, so it generalizes nicely to
configure option to set the default encoding.

I concur on the undesirability of runtime-dependent behaviour, this is what
I don't like with using the GAMBOPT avenue.

-- vyzo
Post by Dimitris Vyzovitis
I think we just want a configure option that makes -:t8,f8,-8 the default
to start with.
-- vyzo
Post by Marc Feeley
1) the compiler to accept source code in UTF-8 character encoding?
2) compiled programs to read and write characters using UTF-8 character encoding?
#1 can be done with “gsc -:f8 
”
#2 can be done by starting your program with the shebang “#! /usr/bin/env
gsi -:t8,f8,-8”.
A configure option could be added to set the default character encoding.
Should the system obey the LC_CTYPE or LC_ALL variable? Good question!
Is this “expected” behavior? It wouldn’t be hard to implement, and still
allow overriding with the runtime options. However, gsc and gsi will
become more fragile and dependent on the run time environment

What’s best?
Marc
Is it possible to build Gambit such that UTF-8 is the default for I/O?
Or should it heed the LC_CTYPE or LC_ALL variable?
I had a bug while compiling gerbil code that vyzo traced to
disagreement between I/O options used by gambit and gerbil. The
workaround was to export GAMBOPT=t8,f8,-8 rather than fight to
propagate the proper options to gsc. I would like to build gambit with
these options as the default, so I can distribute a version that works
out of the box.
—♯ƒ • François-René ÐVB Rideau •Reflection&Cybernethics•
http://fare.tunes.org
He wa'n't no common dog, he wa'n't no mongrel; he was a composite.
A composite dog is a dog that is made up of all the valuable qualities
that's in the dog breed — kind of a syndicate; and a mongrel is made up
of all riffraff that's left over. — Mark Twain
_______________________________________________
Gambit-list mailing list
https://webmail.iro.umontreal.ca/mailman/listinfo/gambit-list
_______________________________________________
Gambit-list mailing list
https://webmail.iro.umontreal.ca/mailman/listinfo/gambit-list
Hendrik Boom
2017-08-15 14:52:18 UTC
Permalink
Post by Marc Feeley
1) the compiler to accept source code in UTF-8 character encoding?
2) compiled programs to read and write characters using UTF-8 character encoding?
#1 can be done with “gsc -:f8 …”
#2 can be done by starting your program with the shebang “#! /usr/bin/env gsi -:t8,f8,-8”.
I've found that the way to handle UTF-8 in my programs is *mostly* to
simply ignore the problem. I read bytes, write bytes, and treat anything
outside the usual ASCII character set as simply being more letters.

If the application needs anything else to be done with specific
characters, I treat this as part of the lexical analyser instead as part
of the character reader.

There's no conversion of everything to and from wide characters on input
and output.

There are likely applications in which this isn't efficient. And there
are likely code points which aren't properly handled this way. But
there are a lot for which this is all that is needed.

-- hendrik
Marc Feeley
2017-08-16 00:25:19 UTC
Permalink
I’ve just pushed a patch to allow specifying at configure-time the default runtime options. For example

./configure --enable-default-runtime-options="f8,-8,t8"

Then running

gsi

will give the same result as

gsi -:f8,-8,t8

This approach has the advantage that other runtime option defaults can be set, for example

./configure --enable-default-runtime-options="m10000"

will set the minimum heap size to 10MB by default. Note that it is possible to override these defaults by giving explicit command line runtime options or using the GAMBOPT environment variable:

gsi -:m5000

Marc
Is it possible to build Gambit such that UTF-8 is the default for I/O?
Or should it heed the LC_CTYPE or LC_ALL variable?
I had a bug while compiling gerbil code that vyzo traced to
disagreement between I/O options used by gambit and gerbil. The
workaround was to export GAMBOPT=t8,f8,-8 rather than fight to
propagate the proper options to gsc. I would like to build gambit with
these options as the default, so I can distribute a version that works
out of the box.
—♯ƒ • François-René ÐVB Rideau •Reflection&Cybernethics• http://fare.tunes.org
He wa'n't no common dog, he wa'n't no mongrel; he was a composite.
A composite dog is a dog that is made up of all the valuable qualities
that's in the dog breed — kind of a syndicate; and a mongrel is made up
of all riffraff that's left over. — Mark Twain
_______________________________________________
Gambit-list mailing list
https://webmail.iro.umontreal.ca/mailman/listinfo/gambit-list
Loading...