rulururu

post Typecasting: Inside the compiler

August 24th, 2010

Filed under: C++ — Kai @ 12:21 pm

I was unsure how Type Casting happens without loss of data inside the compiler.

For example:

 int i = 10;
 UINT k = (UINT) k;

 float fl = 10.123;
 UINT  ufl = (UINT) fl; // data loss here?

 char *p = "bla blub";
 unsigned char *up = (unsigned char *) p;

Now I was wondering how the compiler does handle this type of typecasting?

Well, first of all I got to say that a cast is an explicit request to convert a value of one type to a value of another type. A cast will also always produce a new object, which is a temporary returned by the cast operator. Casting to a reference type, however, will not create a new object. The object referenced by the value is reinterpreted as a reference of a different type.

Now to your question. Note that there are two major types of conversions:

  • Promotions: This type can be thought of casting from a possibly more narrow type to a wider type. Casting from char to int, short to int, float to double are all promotions.
  • Conversions: These allow casting from long to int, int to unsigned int and so forth. They can in principle cause lost of information. There are rules for what happens if you assign a -1 to an unsigned typed object for example. In some cases, a wrong conversion can result in undefined behavior. If you assign a double larger than what a float can store to a float, the behavior is not defined.

Let’s have a look at the casts:

int i = 10;
unsigned int k = (unsigned int) i; // :1

float fl = 10.123;
unsigned int  ufl = (unsigned int) fl; // :2

char *p = "bla blub";
unsigned char *up = (unsigned char *) p; // :3

1. This cast causes a conversion to happen. No loss of data happens, since 10 is guaranteed to be stored by an unsigned int. If the integer were negative, the value would basically wrap around the maximal value of an unsigned int (see 4.7/2).

2. The value 10.123 is truncated to 10. Here, it does cause lost of information, obviously. As 10 fits into an unsigned int, the behavior is defined.

3. This actually requires more attention. First, there is a deprecated conversion from a string literal to char*. But let’s ignore that here. (see here). More importantly, what does happen if you cast to an unsigned type? Actually, the result of that is unspecified per 5.2.10/7 (note the semantics of that cast is the same as using reinterpret_cast in this case, since that is the only C++ cast being able to do that):

A pointer to an object can be explicitly converted to a pointer to an object of different type. Except that converting an rvalue of type “pointer to T1” to the type “pointer to T2″ (where T1 and T2 are object types and where the alignment requirements of T2 are no stricter than those of T1) and back to its original type yields the original pointer value, the result of such a pointer conversion is unspecified.

So you are only safe to use the pointer after you cast back to char * again.

post foreach loop with index

May 19th, 2010

Filed under: .NET — Kai @ 3:49 pm

I was fed up with declaring a “helper” variable for every foreach in which I needed a counter.
The solution is very cool I think:

My list of strings I’d like to iterate:

List<string> myStrings = new List<string>()
            {
                "abc", 
                "def",
                "xyz"
            };

Old style:

            int index = 0;
            foreach (string s in myStrings)
            {
                Console.WriteLine("{0}: {1}", index++, s);
            }

cool way using .net 3.5’s lambda syntax:

            foreach (var o in myStrings.OfType<object>().Select((x, i) => new { x, i }))
            {
                Console.WriteLine("{0}: {1}", o.i, o.x);  
            }

post Dr Watson has died

October 29th, 2009

Filed under: General Programming, Software, Windows — Kai @ 2:58 pm

If you need to get a crashdump running Windows Vista you’ll look for DRWTSN32.EXE for ever and a day. Unfortunately, it’s in vain, Dr Watson died…

Nevertheless that app has been very important for collateral quality control.

Vista has increased the intricacy of everything a lot. In general Vista is not saving minidumps, but keeps account of every crash or error report. If an app is WER registrated a crashdump is stored as well - if it’s not no crashdump is at dumpfolder. Neither here nor there.

This is how it works on Vista (as fair as I could figure out):

  • WER just makes a crashdump for WER signed apps respectively if the WER server requests a report
  • To get a minidump the following regesty value has to be set:
    HKLM\SOFTWARE\Microsoft\Windows\Windows Error Reporting\ (DWORD) create a key named “ForceQueue” and set it to 1.
  • Dumps are getting stored in user directory C:\Users\TheUserName\AppData\Local\Temp and C:\ProgramData\Microsoft\Windows\WER\ReportQueue. File extension is *.mdmp.
  • You can have an overview at Control Panel-> maintenance -> Problems & Solutions -> Check Problems

You also can find several instructions on the web howto install Dr Watson on Vista. There also seems to a tool called DrVista that might help you - which I didn’t try…

Watson

post How to read a technical book to remember most of it?

August 12th, 2009

Filed under: Books — Kai @ 11:34 am

First of all I’d like to say that there’s of course not a winning concept for everyone.

There are many technical books that become thicker and thicker and the pressure from the technical society is more and more to read them and remember many concepts described in them. But it’s so hard to do it.

How do you remember all that stuff? Lots of people use some cards with basic info so that they can remember the details when they look at it.

Sometimes I read a chapter, wait a while, and go back through the chapter with a highlighter. Then, if I ever have to go back in the book, I can just read the highlighted nuggets. But that’s just something you can do with your own books…

Generally, technical books come with lots of little example snippets of code or exercises. I always try (if there’s sufficient time) to do them. But after you’ve done them, think about where you can apply those concepts in code you’ve already written. Go back and refactor with those ideas in mind. Once you see how it can work in a real live project, it will be burned into your brain.

A really good coder once told me…If you read something in a technical book you don’t understand, do something with that concept 5 times in code.
Make 5 separate, little, non-related one-off projects. By the second time around you will probably understand it, and the other 3 are just to get your fingers used to typing it. Don’t proceed further in the book until you do.

If you do not practice it you will not learn it except you’re rain man but nobody is rain man except for rain man. ;)

The learning by doing principle is good but my experience tells me that it is not enough. Moreover, remembering is no longer mandatory. Accessibility of resources is such that finding them is more important than learning them.

post Short precious code snippet

August 12th, 2009

Filed under: .NET — Kai @ 12:45 am

I was recently talking about linq features with a friend. Now I saw somebody wanted to break a loop after 50 iterations. Trivial but also for that there’s a precious solution using linq.

int processed = 0;
foreach(ListViewItem lvi in listView.Items)
{
   //do stuff
   ++processed;
   if (processed == 50) break;
}

use linq

foreach(ListViewItem lvi in listView.Items.Take(50))
{
    //do stuff
}

or, you’re right “old” style would be

for(int i=0; i < listView.Items.Count && i <= 50; i++)
{
   ListViewItem lvi = listView.Items[i];
  //do stuff
}

post E-Mail obfuscation - a disputed question

August 11th, 2009

Filed under: General Programming, Internet, Security — Kai @ 5:15 pm

Many users and forum programs in attempt to make automatic e-mail address harversting harder conseal them via obfuscation - @ is replaced with “at” and . is replaces with “dot”, so

bill.gates@microsoft.com

now becomes

bil dot gates at microsoft dot com

I’m not an expert in regular expressions and I’m really curious - does such obfuscation really make automatic harvesting harder? Is it really much harder to automatically identify such obfuscated addresses?

For example, if every email address on a large community site is reversed in the markup and rendered properly with CSS, or token-replaced (@ becomes ‘at’), or any other predictable method, the harvesters will just write a thin adapter for your site.

Think of it this way: if it only takes you one line of code to “scramble” them sitewide, it will only take the harvester one line of code to “unscramble” them for your site. Roughly speaking.

What concept is the right? Do more complex obfuscation or consider about new ways?

Obfuscation techniques fall in the same category than captchas. They are not reliable and tend to hurt regular users more than bots.

Javascript obfuscation seems to be praised, but is no silver bullet: it is not that hard today to automate a browser for email sniffing. If it can be displayed in a browser, it can be harvested. You could even imagine a bot that’s taking screenshots of a browser window and using OCR to extract addresses to beat your million-dollar-obfuscation-technique.

Depending on where and why you want to obfuscate emails, those techniques could be useful:

  • Restrict email visibility: you may hide emails on your website/forum to anonymous users, to new users (with little to no activity or posts to date) or even hide them completely and replace email contact between members with a built-in private messaging feature.
  • Use a dedicated spam-filtered email: you will get spammed, but it will be limited to this particular address. This is a good trade-off when you need to expose the email address to any user.
  • Use a contact form: while bots are pretty good at filling forms, it turns out that they are too good at filling forms. Hidden field techniques can filter most of the spam coming through your contact form.

One common way of hiding email from bots and spammers is to create an image containing the email address. Facebook does this, for instance. Now, using images for email is inherently bad for accessibility, because text readers will not be able to read it. But even otherwise, there are several free character recognition programs that do a pretty good of decoding such email-images.

At least you have always to keep in brain that if it’s difficult for the spammers it’s as well your users to identify the email address. A nice article from wikipedia on Email obfuscation or address munging you’d pay regard to.

The real question is whether the extra effort will be put in by harvesters and if the (major? minor?) barrier to the harvesters is worth the possible problems for your users.

Finally this article is as so many about fighting spam - In my opinion, spam has become such a problem and so many databases have been turned over that we’re beyond hiding our addresses. Instead, consider of more efficient ways of classifying and blocking spam.

ruldrurd
Next Page »
Powered by WordPress, Content and Design by Kai Bellmann
Entries (RSS) and Comments (RSS)