Elegance!
I love the fact that the words insert and remove have exactly the same number of characters! Makes the code look good!
Just another WordPress site
I love the fact that the words insert and remove have exactly the same number of characters! Makes the code look good!
After starting this blog a few months ago I wrote two articles. After writing those articles I shared them on a forum and that was it. I didn’t have time to write more stuff until now.
During the time the blog was online I receive some spam comments. I didn’t have any captcha or any other method to stop the spam so the total number of spam comments were almost 150. That sucks, of course.
It was obvious that the comments were spam because almost all of them had the same structure and had some text that didn’t make sense being in that particular topic or in this particular blog. They also had some random site attached. Like if every user that read what I wrote would attach a website to the comment.
I decided to investigate the comments information and see if I could get to some conclusion. And I did.
To start all this I stopped the spam with a captcha, of course. The spam did stop which just further proves it was being generated by a bot.
After that I had to extract the information from the WordPress database. For that I made a PHP script that went through the database and wrote everything on a file with a specific formatting style. The script is in a link at the bottom of the topic.
After this I had a file with many things. First, the IPs, names and emails of the comments. They are the information about the person that supposedly wrote the comment. Although now I have a ton of personal emails I don’t plan on doing anything with them and in the end this information is not very useful. The comments themselves? LOL! Just random trash. The only thing that was really useful in this were the websites attached.
If a blog is going to be spammed then said spam is probably going to contained publicity. The only thing these comment publicized were these sites. One thing that I did find strange is that almost all of these websites had a Portuguese or Brazillian Portuguese domain name while the comments were in English. But to try and find out more about those websites I simply wrote their address in the browser. I was a bit surprised with what I saw. Almost all of these websites were equal just changing one image and their name. They were of course just trash. A complete list of these websites digested from the raw information are in a file in a link at the bottom of the topic.
I decide to check the IPs of these websites. So I made a simple Python script that counted how many sites there were with the same IP address. Link at the bottom of the topic.
The result is this:
1 2 3 4 5 6 7 8 9 | 91.121.135.69: 1, 141.105.66.25: 1, 91.213.206.226: 1, 80.82.17.194: 2, 72.233.90.42: 54, 72.233.80.210: 3, 207.7.86.107: 1, 127.0.0.1: 1, 98.124.198.1: 1. |
This means that out off the total sites that were advertised in the spam comments, 54 had the same IP. After this I just executed the command:
1 | whois 72.233.90.42 |
The information whois outputted was that this IP address was hosted by the company Layered Technologies, Inc. Their website: layeredtech [DOT] com. This company is a hosting company.
So basically I get to the conclusion that a hosting company hosts a lot of those one page websites that just contain trash. After that a bot spams WordPress blogs, and perhaps other sites, to publicize those websites.
This is most probably done for search engine optimization purposes. But spam will be spam.
Links:
PHP script to extract information from WordPress database.
List of websites.
Digested counting websites by IP
There are many people today who are beginner programmers. These people are the ones who just recently learned how to make the most basic of programs in some language like Java, Python, C, etc. Like almost every programmer who once didn’t even know what a variable is, these people are sometimes puzzled on how is it that one goes from making simple programs that only write stuff in the terminal, to more complicated programs that actually have an interesting GUI (Graphical User Interface), or use databases, networking features, that work in Android or other systems, etc.
For a beginner, any of these subject may be a little hard at first. This is because a person may not know anything about databases or networking. But what it’s normal to happen is that new programmers are going to search in Google for how to use these technologies.
Let us consider a programmer that up until now has only written basic programs that write and read stuff in the terminal. This person goes to Google and writes something like “How to make programs with graphical interfaces”. After searching for a while, one finds about that what they are looking for is the creation of a GUI. And after that they will know, for example, that they can use Qt or GTK+ (or some other toolkit or framework) to implement on their C/C++ programs.
After some tutorials and a few hours trying to compile a few programs, one starts writing simple programs with a few buttons and a few simple functionalities. And now come the documentation.
The new programmer knows that he can use some tolls to create GUIs, but he is still amazed about the complexity of some of the widgets on the toolkit (click here to know what a widget is), and every time he downloads an example program with those widgets off the Internet he just can’t understand what all those functions in the code do. He will try to mess around the code making small alterations, but even if he understands the basics of the thing he remains with the same amount of knowledge he had, and doesn’t really understand how the widgets work.
Now comes the lesson. Toolkits, development frameworks, libraries, every programming language, etc, need to be documented. What is documentation? Documentation is what every one of these tools has that explains how they work and how they should be operated. There are two main reasons of why documentation exits. First, it is like this that programmers learn how to use almost anything in order to make something new. If something isn’t documented then it can’t be used because it would be very frustrating and time consuming for a programmer to go and figure everything out for himself. And how do we use documentation? Take this example:
The function printf() of the standard library of the C programming language. This is one function that can be used to write text in the terminal. Suppose you didn’t know how to use this function but needed to use it. Well, ordinarily because this is something so simple, you would probably just search in Google for some basic C program or a beginners tutorial and learn like that. There is nothing wrong with that of course. But if you wanted to know every thing, or almost every thing about this function, you would need to go to the documentation of it. Here it is:
http://www.kernel.org/doc/man-pages/online/pages/man3/printf.3.html
So this may be a very boring read, and sometimes the more technical terms may confuse some people, especially beginners. But it is by reading that page that you are going to learn how to use printf, and may also learn about other functions that are similar to printf but serve different purposes. Those functions may even be better for you to use with your current problem or some other problem.
As you can see, the page for printf has a lot of information. A programmer doesn’t need to know about all that to use printf of course. In fact, the most common use by far of the function, be that in any level of programming, maybe just sum to a quarter of the page or less. But the point is that everything about the function is there. This leads me to talk about the second main reason for documentation to exist.
Imagine now that instead of wanting to learn how to use printf you wanted to know how to use a widget of a GUI toolkit (A slightly more realistic situation). A widget may be very complicated. It may have a lot of methods and attributes (or functions and variables, if you prefer the term), and, most probably, it is impossible for a person to remember every one. Bellow is a link to the documentation page on a simple widget of the development framework Qt, for C++:
http://developer.qt.nokia.com/doc/qt-4.8/qpushbutton.html
The widget is the QPushButton. It is a simple button that a programmer places on other widgets to perform various actions. When reading this page we can see that there are some methods in this class (And believe me, in this framework there are many more widgets and other classes that have way more methods then QPushButton). If we read further we can also see that there are a lot more methods that come from inheritance. So the whole documentation for this simple widget is actually quite big.
Because of the size of the documentation, it would be ridiculous to expect that a programmer would memorize this information, or even just a part of it. In a “real” situation where a programmer is working with some framework or library, etc, it’s not important to know everything that a widget can perform on the fly, or, for example when programming with the C language, it not important to know every last detail about the standard library on the fly. It would just be humanly impossible to read the whole documentation and actually retain any important information of it.
So for this reason, documentation is mostly used just when needed (again, it’s used on the fly). What I mean is. If for some reason I am programming in Python, for example, and I need to use a mathematical function like sin(), I’ll just go to the documentation in this page:
http://docs.python.org/library/math.html?highlight=math.sin#math.sin
By reading more about the other functions Python has for some more trivial mathematical situation I will get to know that part of the language a little better. I may implement what I learned in my current project and the whole thing will be better programmed. Of course if for some reason I am in a hurry, or just don’t fell like reading must, I’ll just use the next best thing. Either I’ll ask in a website like Stack Overflow, or I’ll just use Google! Of course googling or asking about some more specific or complicated issues may not be a very good option. In that case just read the documentation.
Concluding. Use documentation to understand what you are working with so that you can use programming languages, frameworks, tools, etc, in a correct and conventional way, and always try to consult documentation when working with something. It is like this that programmer get used to new technologies and are able to make projects with them. As a last resort, or a quick and dirty fix for some small issue, either ask around or Google!
In this article I will be talking about about Segmentation Fault. A bug that is very common in programs written in the C programming language.
When a program, written in C or C++, crashes, it is very common that it’s last lines in the terminal are something like:
1 | Segmentation fault |
This type of bug can be very frustrating for programmers because there are many situations where this bug occurs and the compiler can’t “guess” where those situations are. For beginners or programmers that don’t know the inner working of the compiled language, tracking the cause of a segmentation fault may became a very boring task. It’s usually where the programmer will just modify one random line of code at a time and compile over and over again until the segmentation fault goes away, without really understanding how the bug was solved.
Like I said above, there are many ways for a segmentation fault to occur, but only two main reasons for this to happen. Those reasons are when a program tries to write in memory addresses that aren’t supposed to be written or read in memory addresses that aren’t supposed to be read.
Before going into details about the bug, the concept of virtual memory should be lightly understood first.
Programs written in assembly or machine language, have to work directly with memory. This means that there are no variables or pointers of type int or double called x or counter, there are only memory addresses that are distinguished by their hexadecimal address (Like 0xfffffd7f, for a 32 bit machine).
Instead of working with the addresses of the physical memory, the programmer uses an address space between 0×00000000 and 0xffffffff with a total of 4 GB of memory. This address space is where the instructions of the program are stored (that is, the instructions the processor can understand), where the variables are store (either local variables, arrays, etc) and also where some parts of the operating system are. This address space is equal for every program of the same architecture. With this is mind, one may think that, in a multi-task environment, if two programs in the same computer are running at the same time, then the contempt in address 0x080fff40 of one program is equal to the contempt in address 0x080fff40 in the other program. But no. What happens is that the virtual memory for each program (or process or task), even if it has equal address spaces, is actually stored in different locations in the physical memory. Like this, one program can’t write in the address space of another program.
Because of the limitations a computer, storing the 4 GB of virtual memory of a program that’s being executed is a very inefficient way of managing the memory of a multi-task system. To avoid this, virtual memory is divided into segments, and only the segments that are being used, that is, that got allocated, are stored in the physical memory. Some of the segments may be shared with different programs. This happens with some parts of the OS code.
So now we can start understanding what a segmentation fault is. Take a look at the following code.
1 2 3 4 5 6 | #include <stdio.h> int main(int argc, char *argv[]) { <p style="padding-left: 30px;"><code>int *x; printf("%p\n", x); *x = 10; |
}
When I executed this, I got this output:
1 2 | 0x9d2ff4 Segmentation fault |
What happened is that I didn't initialise the pointer *x, so it is pointing to a random location. That random location, 0x009d2ff4 (with or without the last two 0s the address is the same), can't be used by the program, either because the location is in an area that can't be written or because the location hasn't been segmented. So the processor can't write the value 0x10 in the physical memory the corresponding to 0x9d2ff4. An address that can't be written is for example, the addresses that have the program's instructions.
Most of the times, a segmentation fault also occurs when someone sets a pointer to 0x0.
1 2 3 4 5 6 7 8 9 10 | #include <stdio.h> int main(int argc, char *argv[]) { <p style="padding-left: 30px;"><code> int *p, x; p = &x; *p = 0; printf("%p - %d\n", p, *p); p = 0x0; //p will now point to 0x0, (nil) printf("%p - %d\n", p, *p); |
}
The output:
1 2 | 0xbf991cc8 - 0 Segmentation fault |
In this case, the printf function tries to read what's in 0x0. Because reading this location is not allowed the program crashes returning a segmentation fault.
Another common way this bug occurs is with arrays, like in the following code:
1 2 3 4 5 6 7 8 9 10 11 12 13 | #include <stdio.h> int main(int argc, char *argv[]) { <p style="padding-left: 30px;"><code> int x[5]; x[0] = 10; x[5] = 90; x[-1] = 40; printf("%p - %d\n", &x[0], x[0]); printf("%p - %d\n", &x[5], x[5]); printf("%p - %d\n", &x[-1], x[-1]); printf("%p\n", &x[10000]); printf("%d\n", x[10000]); |
}
The output:
1 2 3 4 5 | 0xbfd0370c - 10 0xbfd03720 - 90 0xbfd03708 - 40 0xbfd0d34c Segmentation fault |
Above an array x is declared. Because it's an array with a size of 5, it's index space should only be between 0 and 4. But when we try to write in x[5] or read in x[5], the operation happens without any problems. Because an array in memory is displaced in a continuous way, every element is right after or before one another. To know the address of any element, it is calculated in relation to the address of the pointer. So x[2] is actually *(x+2), and x[4] is *(x+4), and so on. (This happens for any type of array either char, double or an array of a data structure that you declared your self. There is no need to do something like *(x+sizeof(int)*2) because the compiler adjusts the offset automatically.)
Because the address of the elements get calculated this way, calculating x[-1] or x[5] is no differetn then doing *(x+ -1) and *(x+5). What happens is that the address is going to something else then the array of five elements declared, but not necessarily a location that was not segmented or a location that the program can't write or read. This, however, may be a terrible security flaw because it's exactly this kind of situation that leads to a common exploit called buffer overflow. To know more about buffer overflows read This article.
Returning to the example, when x[10000] gets read, it goes to the location 0xbfd0d34c. This location, unlike x[-1] or x[5], may be in an address that doesn't allow reading or an address that is not segmented. Like before, the program crashes with a segmentation fault.
One last situation that can cause a segmentation fault is a stack overflow. (If you don't know what a stack is, visit this article.)
In the virtual memory space, there is a location that is revered for a stack. Every time a function gets called, it's local variables, arrays, etc, get stored in a space of the stack, along with more information necessary for the correct execution of the code. Every time a new function gets called a new space in the stack gets allocated right after the last space used. So the stack grows. Every time a function ends, the stack diminishes.
The space the stack can occupy is limited, and if enough functions get called, the stack may run out of space to grow. This situation is known as stack overflow and when it happens the program crashes causing a segmentation fault.
Stack overflows are almost always cause by a recursive function being called in an infinite loop, like in the code bellow:
1 2 3 4 | void stack_overflow() { <p style="padding-left: 30px;"><code> stack_overflow(); |
}
int main(int argc, char *argv[])
{
1 2 | stack_overflow(); return 0; |
}
The code just calls the function stack_overflow over and over and never stoops. This will only lead to one output, segmentation fault.
Summarizing. When a segmentation fault occurs, it is probably because the programmer made a mistake regarding pointers or a stack overflow occur. So next time you encounter this problem you'll have a good idea of the cause. To better track problems like this use a debugger. If you are working in a Linux environment use a gdb. It's not really easy to use for beginner but is surly a great toll.
Social Widgets powered by AB-WebLog.com.