I can now read data arrays that are around 16 exabytes large. (obviously you can store more things if you divide them in to more then one array). I can make very large arrays too, but only in the range of thousands of gigabytes, since the pointers eventually become so large that you cant keep them I'm memory realistically. I will remedy that tomorrow when i will work on being able to overtime build up large data sets rather then define them in one go. This is exhausting, and it makes my brain hurt, but I'm making progress. I know its going to get a lot worse once i start with regression. Iv been debating weather to allow insert and delete or only over write, and i have come to the conclusion that insert and delete will probably be necessary and i will get some performance gains by supporting it in the lowest level. Much like an MMU can fiddle with address space when you call a realloc, I can do the same with the chunks that make up large data sets.
You might think that if you would redesign the Internet you would make it so that everyone has a identity backed up with a cryptographic key pair. You would be wrong. You wouldn't have one key you would have thousands. First of all you are multiple people, one for your family, at least one for your job, another one for what ever kink you are in to and so on and on. Often its a good idea to keep your identities apart. If you live in some countries your life may depend on it. So yes a few identities, but thousands?
Not just people have identities so do things and software. You may not want be able to know what software produced what data and and what data it has access to. Data it self can also have identity. In Unravel all data is identified with hashes. This means that if you have a file and i have file our computers can talk to each other about that file. We can for instance reference to an article and know that we are both referencing the same article. No need for a signature for any of this, just a simple hash.
Sometimes we want to reference to the Guardians front page and we mean a very specific front page at a specific moment in time. Other times however, we want to reference the guardians front page as an evolving publication that keeps changing again an again. To do this we need a cryptographic key that keeps signing the hash of the current front page. For anything that we want to be able to change we need an identity. Most things change, so most things need an identity, so lots if identities.
Right now I'm working on this universal link system where you can either reference to a publication or a hash. This universal pointer will be 256 bytes long and contain a 64 bit length value. I also have a trick where if the data itself is smaller then the check sum, then the data can be stored directly in the check sum. This will create a natural 248 byte limit where data can be embedded in to links. I'm sure someone will find some great significance in that number that speaks to the human condition. I expect it to optimize network traffic. Speaking of optimizations, all these links are implemented as single memory allocations. This means that I cant use structs, but I have to cast in to offsets in to memory. It does give me that warm low level feel only cache coherency can give.
That was fast. Apparrently the internet doesnt like to be changed. My Internew went down today. Oh, well. Visual studio still likes me. So work continiues.
I'm starting a new project today. I'm going to redesign the Internet. I don't like the Internet as it is, so I'm redesigning it from thew ground up. I have been planing to do so for a while, but now I'm going to put down some real time in to it. I'm starting pretty low-level. I'm building a database to store all necessary data. Why transfer data if you cant store it, right?
It needs to be secure, fast, thread safe and easy to use. As I'm a C programmer I'm not really sure what kind of fluffy ease of use today's high level programmers expect, but I want direct memory access where you can get a pointer straight in to a block of memory (did I mention that I want it to be low level fast?) but I also want to be able to handle data sets way larger then can fit in to memory. Whats the best way to get a pointer in to a 16terrabyte file?
Right now I'm trying to wrap my head around what limitations I'm required to live with. I don't want to be the guy who thought no one would ever need more then 640k. I considering having a limit of 8 exabyte files. I hope that's OK. This is one of the reasons why this is an interesting project. Its easy to build something when you know what is going to be used for. I'm buildings something that that is meant create things we cant yet imagine.