Hacker Newsnew | past | comments | ask | show | jobs | submit | aknoob's commentslogin

A bit off track but here in India I see, all sorts of negligence shown by delivery guys of almost all delivery companies - Swiggy, Zomato, Uber Eats, etc. Even cab drivers of Ola, Uber are reckless. It is high time these companies start taking accountability.


I worked with routing in a logistics company in India. The whole last mile delivery had so many special cases that we stopped trying to model them all. Drivers would take routes to avoid cops, or go toward a cop they knew (they had paid off in the past). Drivers would not go into each others' territories (so 2 drivers working for the same company would not deliver if their areas overlapped). For delivery people on 2-wheelers, all road rules were optional. They would go down wrong way on one-way roads etc. (we could track them, we called them 'dirty routes'). Drivers would prefer their own routes even when we gave better routes. Sometimes better was vaguely defined, since we might think this route save 10% time so is better, and he might think my favorite paan shop is on that route, so the other route is better.


Conceptually, Pointer is a higher and a more well defined, level of abstraction than a raw integer.In much the same way as an Iterator is a higher and more well defined level of abstraction than a raw pointer.

The fact that an Iterator, a pointer and an integer all tend to be different ways of looking at same integer value is more or less an implementation detail.


I think the question should be why is assignment operator represented using "=" ?


A null-terminated representation is as close to a fundamental datatype for a string as possible. It is same in spirit as other fundamental data types in C like array. People have built abstractions over these fundamental datatypes over the years.


If it isn't in libc, it doesn't get used by anyone wanting to write portable APIs.


It's not fundamental at all. You can't even represent null bytes in a null-terminated string.

A length prefix is pretty clearly superior.


A string contains characters. NUL is not a character; it's nothing.

"Fundamental" in this case means "matches reality". Having a number at the beginning doesn't match reality as closely as having the string of characters in sequential memory addresses with something to terminate them.

The quick fox made the jump\N

or

27The quick fox made the jump

The second one requires more work to store (a character-counting routine), and needs even more work to handle variable length strings that may exceed 255-ish bytes/characters.

I'm not discounting the benefits of prefixing the length, just saying it's not more fundamental than null-terminating an arbitrary sequence of characters.


"A string contains characters. NUL is not a character; it's nothing."

You already couldn't make this argument stick in the ASCII era, where a string can't contain NUL but can contain SOH (Start of Heading), STX (Start of Text), ETX (End of Text), EOT (End of Transmission), ENQ (Enquiry), ACK (Acknowledge), BEL, BS, HT (horizontal tab), LF, VT (vertical tab), FF (form feed), CR, SO (shift out), SI (shift in), DLE (data link escape), DC1, DC2, DC3, DC4 (device control 1-4), NAK (negative ACK), SYN (synchronous idle), ETB (end of transmission block), CAN (cancel), EM (end of medium), SUB (substitute), ESC (escape), FS (file separator), GS (group separator), RS (record separator), US (unit separator), and DEL, but Unicode makes that argument even sillier. Strings have always contained things that aren't "characters".

The real problem is no matter what in-band character you take as the magical termination character, you will have strings that want that in it, because in the general case strings can contain anything, because C is always asking you to pass them around to things as the general-purpose storage data structure. You can fix that with an escaping scheme, but now you have an escaped string, not just "a string". Since strings do indeed need to be able to carry NUL in the general case, you either must have some sort of scheme for representing them, or expect a ton of errors when things jam the distinguished character into your string when you didn't expect it. (Note that for precisely the same reasons that NUL-termination isn't a good idea, there isn't any way to "filter" wrong NULs. You can't tell.)

You might just barely be able to argue the problem is that C's library mistook NUL-terminated strings for arbitrary-sized arrays that can contain anything, but in C if you want arbitrarily-sized arrays you would then have no choice but to pass the array size around to every call that expected such a thing. The next immediately obvious thing to do is to pack the number together with the array in a struct, and lo, we're back to length-delimited strings.

No matter how you slice it, C's got a major foundational screw-up in this area somewhere. If NUL-terminated strings are the bee's knees, C's APIs still took them in way too many places where they are not appropriate, and it caused decades of serious and often exploitable bugs.


> but Unicode makes that argument even sillier

Unicode Standard (version 10.0, section 23.1 Control Codes) makes it clear that it "specifies semantics for the use" of only 9 of those ASCII control codes you mentioned, i.e. U+0009 to U+000D (HT, LF, VT, FF, CR) and U+001C to U+001F (RS, GS, RS, US). The rest of the 65 ASCII and Latin-1 control codes, except U+0085 (NEL), "constitute a higher-level protocol that is outside the scope of the Unicode Standard".

Particularly about NUL, it says: "U+0000 null may be used as a Unicode string terminator, as in the C language. Such usage is outside the scope of the Unicode Standard, which does not require any particular formal language representation of a string or any particular usage of null."

So Unicode makes that argument less silly.


NUL is a character in the ASCII character set. That is a problem because you cannot create all the strings composed of ASCII characters in C.

But C never claimed to support all ASCII strings. C doesn't even have strings. C just has char arrays, which are byte arrays. When strings were formalized by convention in the stdlibs, clearly the supported strings are 1-255 strings, NUL excluded. That's the character set available for strings in the stdlibs. If you insist on using stdlib strings for some other kind of strings, that's your own problem.


"But C never claimed to support all ASCII strings."

That is precisely my point... there is no well-supported solution in core C for arbitrary binary strings, despite C's extremely frequent use in domains that require them. If you insist on using stdlib strings for other kinds of strings, you do have a problem... but you also have no other choice. Which brings it back to being a language/library problem.

As I already alluded to, C itself doesn't have a problem with length-delimited strings, and there are plenty of libraries you can get for them. But the core library for C does force this problem in your face by leaving you no other choice, and it is a valid criticism of C.

(C is such a disaster that the only thing to do is to leave it behind as quickly as possible. However, if we were somehow stuck with the language itself, there's a lot of ways we could improve the libraries it comes with, as again demonstrated by the many such improved libraries you can get. However, one of the things I've learned from learning a ton of languages over the past couple of decades is that a language almost never manages to escape from its own standard library, and the few that manage it (like D) pay a stiff adoption price in the process. C's standard library has a real problem here, that has caused real bugs, and no amount of wordplay is going to fix those decades of bugs.)


Good point. And I would agree that the error lies in choosing to use strings for inappropriate places.

Also, ETX might have been a good terminator :) I assume NUL was chosen for easier checking (if (char) ...) vs (if (char == 0x03) ...)

But my argument was against length prefixing somehow being "more fundamental" than having just a sequence of characters "raw" in memory addresses.


None of it really "matches reality." It's all binary numbers, and on a deeper level, voltages or magnetized particles.

0 is not a letter of the alphabet, but nor is 01000001 (ascii 'a').

So either the first number is special, or you look for a special number to indicate the end. Neither represents reality, because the "end" of a single group of characters is visually identical to a million white-space characters that happen to fit into the emptiness that follows.

My point being, it's probably not helpful to argue which "matches reality" when they're both just abstract representations of concepts.


I was going more toward "closer to reality". But I take your point. Somewhere we're going to need extra info about the string itself, whether that extra info is a magic terminator or a magic prefix. The magic prefix gives great benefit, but also is more complex to implement if you want to store an arbitrary-length string.


Most CPUs have a flags register, and typically have a "zero" flag which is set when the result of the last operation was zero. Zero is special in the vast majority of hardware designs. Checking for null (zero) instead of another specific value often saves a few cycles. That's where the optimization of having all FOR loops count down towards zero comes from, the check saves a cycle or two each iteration on some CPUs. The same thing happens when reading from a buffer, the load instruction will set the Zero flag when the terminating null is read.

The difference doesn't matter much on modern (non-embedded) processors, but it did make sense at the time C was designed. It matches the most common hardware design pattern better than the alternatives.


> NUL is not a character

Somehow NUL is still an assigned character in the ASCII code table. Strange, hmmm?


Ha, the fact that I spelled it "NUL" instead of using the word "null" should have made me pause. :)

Ok, it's an ASCII character code point. One that's used to terminate strings. I meant it's not a character you'd find in the middle of a string, though I realize that's kinda tautological. Back when ASCII was developed, punch cards were used. Any row in the card that wasn't punched was a NUL. It wouldn't have made sense to have it in the middle of a string. It would be like missing a character altogether.


Why would you want to represent a null byte in a string? Is there a character encoding where the null value has a meaning?


Interesting some Unix command line utilities will send null separated records if you pass a flag (often -0) because it's the least likely character to show up as part of the string.

find and xargs are examples of programs with this feature.

It depends a bit on what you call a "string". If you're thinking "something a human will want to read", then yeah, there's no much need to encode null. If however you take a looser view of "an 8 bit vector" then encoding null becomes important. Otherwise your system can't be 8 bit clean.

Overall I think the null terminator has caused more problems than it has solved, but prefixing the string length isn't a panacea either. You end up with systems with 256, 65536, or even 4294967296 byte limits on their strings. It's also more difficult to pass around an index into the string so you end up having to make lots of copies and then possibly merge them later or your language is cluttered with index values everywhere strings are used.

It's quite possible that if K&R had gone with length prefix strings that we would have a different class of errors where the string index gets offset or malicious values are inserted in the length field.


Elaborating on the NUL bytes on the command-line, e.g. find -print0 | xargs -0:

Using find -print0 etc. is a good idea not so much because NUL is an uncommon character (the various record separators / vertical tab / ... are no more common), but because UNIX - being a C system through and through - allows any character to appear in a file name except '/' (path separator) and NUL. Thus, NUL makes a perfect separator between filenames.


Then it sounds like 'find' f'ed up, if, when these things are passed around, they are not escaped properly (not saying this is the case). Just like today with various charsets, whenever there is a charset boundary, say between bytes and C library strings, which is what this is, there has to be a charset conversion.


By default, find separates by newline; this is human-friendly, but breaks if an attacker/script/... puts a newline in the filename.

The UNIX filesystem, qua filesystem, doesn't have a character set, just NUL-terminated strings. On the plus side, it's simple to handle, and means that retrofitting UTF-8 or another encoding is pretty easy. On the downside, two bytestrings that Unicode-canonicalize to the same value may name different files, which is surprising for humans.

It's notable that many of early UNIX' competitors were much more full-fledged systems, featuring full-fledged record-oriented files and typed data instead of UNIX' bytestrings-everywhere approach.


Because you just read it from a file and don't want to corrupt it?


Could it be some extreme form of muscle-memory ?


Your assumption is right. Once the shared library gets loaded in the process address space. You can also set a breakpoint on module load.


Depends on how you define fault here. Is it mere logical correctness of code or any error that might occur when code executes. If we follow former definition of fault then , they both are similar in being "fault-proof". If we follow later definition then execution environment of the program comes into play and it depends on a lot of variables and their complex interplay, that it becomes much harder to provide any formal guarantees of fault tolerance.

However there are execution environments, read RTOS that try to be fault tolerant. Also various virtual machines like JVM,CLR try to provide fault tolerance to varying degrees.


Do you have data supporting your claim. I know a lot of people who are really good and continue to work in India. Infact I have also seen a lot of good people coming back to India (since last 2 - 3 years). So things are gradually improving for good.


The very first thing that you need to do is to pick a software stack, ideally opensource and then learn how that software-stack works bottom up. Learning how a linux application works might be a good starting point if you are totally new.

There are multiple layers involved here and really understanding each one would take time.

Next would come understanding browsers, browser although it is an app, it is a world in itself. How an http request flows through a browser, how an http response is rendered, what are various layers involved. TCP/IP stack to physical layer, wifi/usb. It is extremely vast and very interesting.

And once you have gained enough experience , you will be able to clearly see the similarities and differences between various software stacks, both bottom-up and top-down, right from hardware level to your application's code and vice versa. And then reasoning about security of the stack at various layers would become straightforward.

In terms of conferences, I find Blackhat(http://www.blackhat.com/) Conference is a very good source of keeping oneself up to date with world of security( including applying Data Science to Security)


Isn't this scary ?


With how much people hate the sound of their own voices, I think that would backfire on the advertiser.


It they could make it sound like my voice sounds from inside my head, and not like the weirdo that other people apparently hear, it would be scary on several levels.


Subvocalisation. Your own voice speaking quietly in the background to something else. Ideal for consumer indoctrination.


The voice we hear in our head and the one everyone else hear is starkly different.


The voice I 'hear' in my head and the one I actually hear when talking is starkly different.


So, ads spoken with your voice, as you would hear it from your head.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: