rulururu

post Sorting Strings containing Numbers

March 24th, 2008

Filed under: .NET, General Programming — Kai @ 6:20 pm

It’s often necessary to sort lists. The Sort method of ArrayList class shows that it can’t be difficult. Also some other classes can be sorted unless they are type IComparable.
The sort method of ArrayList class has several overloads, some can be given to a Comparer, which implements the IComparer interface.

The interface has just one method called Compare which returns equal, greater or smaller depending on the two values it compares.

If you’d make your own comparer you’d simply do that like this:

public class MySpecialComparer : object, System.Collections.IComparer
{
 
    /// <summary>
    /// Default constructor - initializes all fields to default values
    /// </summary>
    public MySpecialComparer()
    {
    }
}

There’s also a difference in ascending and descending sort order. Mention the following code which compares String descending:

public class DescendingComparer : IComparer
{
  public int Compare(object objA, object objB)
  {
    return String.Compare(objB.ToString(), objA.ToString());
  }
}

The trick is just to compare B to A and not the way round.

To sort a simplest one dimensional array in alphabetical order, we’ll leverage the static “Sort” method in the Array class to sort the array. The Array.Sort method will sort the array in place, meaning we don’t have to create another array to contain the resulting array. Here is how:

String[] myAnimals = {"Zebra","Elephant","Snake"};
Array.Sort(myAnimals);

As expected this will result in the array getting sorted in the ascending order i.e. iterating the elements of this array will cause the output in the following manner:

Elephant, Snake, Zebra

Computer string sorting algorithms generally don’t order strings containing numbers in the same way that a human would do. Consider:

rfc1.txt, rfc2086.txt, rfc822.txt

It would be more friendly if the program listed the files as

rfc1.txt, rfc822.txt, rfc2086.txt

Filenames sort properly if people insert leading zeros, but they don’t always do that.
It’s kinda tricky when strings that contain numbers are sorted ’cause they’ve to be sorted numerically.

Imagine an array of values like this:

string[] Items = { "z4", "z2", "z15", "z1" };

The idea is to write a Comparer (implements IComparer) which extracts the numeric part e.g. by RegEx: [0-9]*
Then sort as already familiar with a Collections.Sort method that uses the comparator.

Generally you’d “invent” an algorithm that breaks strings into chunks, where a chunk contains either all alphabetic characters, or all numeric characters. These chunks are then compared against each other. If both chunks contain numbers, a numerical comparison is used. If either chunk contains characters, the ASCII comparison is used.

Ian Griffith offered a code sample that helps to handle string sorting with numbers:

/// <summary>
/// Compares two sequences.
/// </summary>
/// <typeparam name="T">Type of item in the sequences.</typeparam>
/// <remarks>
/// Compares elements from the two input sequences in turn. If we
/// run out of list before finding unequal elements, then the shorter
/// list is deemed to be the lesser list.
/// </remarks>
public class EnumerableComparer<T> : IComparer<IEnumerable<T>>
{
    /// <summary>
    /// Create a sequence comparer using the default comparer for T.
    /// </summary>
    public EnumerableComparer()
    {
        comp = Comparer<T>.Default;
    }
 
    /// <summary>
    /// Create a sequence comparer, using the specified item comparer
    /// for T.
    /// </summary>
    /// <param name="comparer">Comparer for comparing each pair of
    /// items from the sequences.</param>
    public EnumerableComparer(IComparer<T> comparer)
    {
        comp = comparer;
    }
 
    /// <summary>
    /// Object used for comparing each element.
    /// </summary>
    private IComparer<T> comp;
 
 
    /// <summary>
    /// Compare two sequences of T.
    /// </summary>
    /// <param name="x">First sequence.</param>
    /// <param name="y">Second sequence.</param>
    public int Compare(IEnumerable<T> x, IEnumerable<T> y)
    {
        using (IEnumerator<T> leftIt = x.GetEnumerator())
        using (IEnumerator<T> rightIt = y.GetEnumerator())
        {
            while (true)
            {
                bool left = leftIt.MoveNext();
                bool right = rightIt.MoveNext();
 
                if (!(left || right)) return 0;
 
                if (!left) return -1;
                if (!right) return 1;
 
                int itemResult = comp.Compare(leftIt.Current, rightIt.Current);
                if (itemResult != 0) return itemResult;
            }
        }
    }
}

When using it a regular expression for the numbers is needed.
Using C# 3.0 it can be used that way:

Func<string, object> convert = str =>
{   try { return int.Parse(str); }
    catch { return str; } };
 
var sorted = testItems.OrderBy(
    str => Regex.Split(str.Replace(" ", ""), "([0-9]+)").Select(convert),
    new EnumerableComparer<object>());

Again the lambda expression can help a lot. sorted is an untyped listed that can be iterated with foreach.

When sorting / comparing strings that contain numbers you always should consider to sort them the way a human would do.

post Short type names

March 24th, 2008

Filed under: .NET — Kai @ 2:10 pm

One nice tidbit is how to create new short type names without having to specify the full name of every non-primitive type, especially inside of generic type names.

Normally, using “using” to create a new type name, even in the presence of the appropriate namespace import, one has to fully qualify every type name that is not a keyword.

using System.Collections.Generic;
using T = System.Collections.Generic.List<string>

However, if the “using” is inside a namespace, then it will utilize any imports located in an outer namespace.

using System.Collections.Generic;
 
namespace MyCompany
{
    using T = List<string>;
}

post XML the LINQ way

March 11th, 2008

Filed under: .NET — Kai @ 6:51 pm

In former posts I often wrote something about LINQ (Language Integrated Query, pronounced “link”) especially when writing about advantages of .NET 3.0.

Today I’d like to give you some really usefull snippets you can use with LINQ when accessing an xml file.

First of all I’ve got to show you the xml file I refer to. It’s just a small document that holds some information about music albums.

<?xml version="1.0" encoding="utf-8" ?>
<Albums>
  <Album>
    <Artist>Black Sabbath</Artist>
    <Title>Seventh Star</Title>
    <Date>2/20/1986</Date>
  </Album>
  <Album>
    <Artist>Judas Priest</Artist>
    <Title>Jugulator</Title>
    <Date>3/24/1997</Date>
  </Album>
</Albums>

Reading xml with LINQ can be archieved in the twinkling of an eye.

At first I create some anonymous objects with the properties, Artist, Title, and Date.

XDocument xmlDoc = XDocument.Load("albums.xml");
 
var albums = from album in xmlDoc.Descendants("Album")
                select new
                {
                  Artist = tutorial.Element("Artist").Value,
                  Title = tutorial.Element("Title").Value,
                  Date = tutorial.Element("Date").Value,
                };

The result (albums) is an IEnumerable collection filled with anonymous objects. It can simply be accessed that way:

foreach(var album in albums)
{
   MessageBox.Show(album.Title.ToString());
   MessageBox.Show(album.Date.Subtract(DateTime.Now()).ToString().Format ("Y"));
}

For not using anonymous types all the time a better solution is to create a typed list.

List<artist> albums = (from album in xmlDoc.Descendants("Album")
   select new album
   {
     Artist = album.Element("Artist").Value,
     Title = album.Element("Title").Value,
     Date = DateTime.Parse(album.Element("Date").Value),
   }).ToList<album>();

This is pretty clear and can be extend easily.

I also would be possible to constrict the select. If the list should just contain every album with artist’s name being something “black” just type:

List<artist> albums = (from album in xmlDoc.Descendants("Album")
   where artist.Element("Artist").Value.Contains("Black")
   select new album
   {
     ...
   }).ToList<album>();

Also distinct, group by, order by and some others operations can be done that way.
To filter the result you can design really complex statements using LINQ.

Creation of an xml file is easy as well. The XDocument and XElement types have a load and a parse method for creating.

The first thing to do is creating a document:

XDocument doc = new XDocument(
    new XDeclaration("1.0", "utf-8", "yes"),
    new XComment("All my albums"),
 
    //... new xElements
)

This is an anonymously typed data structure:

var albums = new[]
{
   new {Artist = "Iron Maiden", Title = "Flight Of Icarus", ID = 3},
   new {Artist = "Iron Maiden", Title = "Running Free", ID = 1},
   new {Artist = "In Flames", Title = "The Mirror's Truth", ID = 4},
};

When creating the xml out of this, it will be additionally done with proper order (notice that the ID is not in proper sequence). For better demonstration how the data can be filtered I, to simplify matters, added an identification number to each album.

XElement albums = new XElement("albums",
                        from a in albums
                        orderby a.ID //descending 
                        select new XElement("Album",
                            new XElement("Artist", a.Artist),
                            new XElement("Title", a.Title),
                            new XAttribute("ID", a.ID)
                            )
                        );

Notice the difference between XElement and XAttribute:

Element:

<Artist>Iron Maiden</Artist>

Attribute:

<ID = "2"></ID>

or

<ID = "2" />

When you’d rather use elements and when using attributes is recommented is described here:

[IBM] Principles of XML design: When to use elements versus attributes

At least you shouldn’t forget to save the file to streams.

doc.Save(@"C:\albums.xml");

I don’t wanted to tell you all about LINQ, it’s more likely to show how incredibly powerful LINQ is. This covers even a small portion of it. LINQ can be used for databases just as easily as it was used here on XML. As you can see, the LINQ project is extremely ambitious and this blog entry has just scratched the surface.

post PowerShell for admins

March 10th, 2008

Filed under: .NET — Kai @ 5:57 pm

Windows PowerShell is a new command-line shell and task-based scripting technology that provides comprehensive control and automation of system administration tasks.

However, I became more dissatisfied with the native Windows command shell. I learned to program the UNIX shell (bash to be specific) a bit and found it to be way more advanced than the Windows shell.

Aesthetically, PowerShell doesn’t look too different from most existing shell scripting clients. In fact, on a superficial level you could probably be fooled into thinking that it is exactly the same as either bash or cmd. PowerShell, however has a very powerful underlying structure that ties in neatly with the registry, the .NET Framework and (more obviously) the file system.

I think the text shell is the preferred way to do system administration for most people who have to take care of a system.

Let’s have a closer look:

PowerShell is available for Windows XP, 2003 Server and Vista.

  • Download .Net 2.0 Framework
  • Download PowerShell
  • Unzip the downloads and run their respective setup. I used all the defaults.

Type “Powershell -?” and you’ll see the command line parameters you can use to start PowerShell. There are a couple that are interesting here.

Using PowerShell is a bit different than any other command shell I have used. It is centered around the concept of cmdlets (command lets), which are commands that manipulate objects to perform a single task.
Every command in PowerShell returns a collection of objects. These objects can be manipulated in the exact same way as if you were using an object-oriented language. The ls/dir command creates a collection of file objects which can have any .NET function performed upon them from within the shell.

When you’d like to use a command with a pipe you can do it this way:

PowerShell -NoProfile -NonInteractive -Command "Get-Process *ss |sort handles"

This PowerShell syntax is obviously more verbose but it is also more powerful.
Also consider using the “-confirm” parameter to test configurations before execution for certain commands.

Killing Task the easy way with Powershell instead of Task Manager. For example we attemp to kill Winamp.

get-process wina*

Once the Process ID has been identified, you can kill the errant process by entering:

stop-process -id 2792

That way is also possible:

get-process wina* | stop-process

Windows PowerShell includes an untyped scripting language which can implement complex operations using cmdlets imperatively. The scripting language supports variables, functions, branching (if-then-else), loops (while, do, for, and foreach), and error handling, as well as integration with .NET. Variables in PowerShell scripts have names that start with $; they can be assigned any value, including the output of cmdlets. While the language is untyped, internally the variables are stored with their types, which can be either primitive types or objects.

Microsoft’s PowerShell is a solid step forward in script-language design.

There’s a whole bunch of scripts available on the internet, some are just one line others are really complex - there are limits set on your creativity.

Also remote adminstration via ssh can be done more enhanced. Just use freeSSHd server and log in with Putty (ssh client):

$PowerShell/v1.0/powershell.exe -Command -

Another alternative solution for a better windows shell is win-bash which is based on nt_bash which was an early bash port for windows nt started by Mountain Math Software some years ago. As far as I know, the nt_bash port project stopped in alpha status and has been never finished.

Download Windows PowerShell 1.0: [link]
You can also have a look at the preview release of Version 2.0 from November 6, 2007.

Win-bash project on sourceforce: [link]

post Worldclock Application

February 27th, 2008

Filed under: .NET — Kai @ 1:12 am

Due the fact that I sometimes chat in IRC channels (like #cpp, #wxWidgets or #xlib) I have to deal with people located almost everywhere in the world. I mentioned that no just my time of day hanging around in IRC are curious also their times are often kinda strange.
For better understanding when they’re available I needed a worldclock.
There are dozens of worldclock implementations on the Internet but most of them were contrary to my expectations.

That’s the reason why I on the spot decited to code my own. I mean the solution I coded is worth looking at ’cause it is without any complexity and has just as little numbers of code as possible.

Of course if I would have done a console application I used needed two or three lines of code.

For example in Perl this would have been very easy:

#!/usr/local/bin/perl
@timeData = localtime(time);

and then pass add/subtract the gmt difference and convert it to date again. Finally you just have to print it.

It was my intention to create at least a tiny graphical user interface for some user friendliness. I decited to do a WinForms GUI, I also could have done it with MFC or WxWidgets but maybe I’ll port it sometimes to WPF.

This is what it looks like:

WorldClock

Let’s have a look behind.

I created a window, a groupBox, two labels and a combobox for the selection of the country.
All timeszones are stored in an xml file:

<?xml version="1.0" standalone="yes" ?>
<Countries>
	<Country NAME="AbuDhabi" GMT="4" />
	<Country NAME="Adelaide" GMT="9.5" />
	<Country NAME="Alaska" GMT="-9" />
        ...
</Countries>

All data is hold in a class named Country:

private double _dGmt;
private String _sName;
private System.Drawing.Bitmap _imgFlag;
 
public Country(String name, double diff)
{
     Name = name;
     Gmt = diff;
}
 
public Country(String name, double diff, System.Drawing.Bitmap flag)
{
     Name = name;
     Gmt = diff;
     _imgFlag = flag;
}
 
public System.Drawing.Bitmap ImgFlag
{
     get { return _imgFlag; }
     set { _imgFlag = value; }
}
//the rest of the get/set properties follows here.

I provide two constructers, one for name and timezone and the second additionally with bitmap ’cause I maybe will someday add a country flag to each timezone.

After loading the form a typed list for Country objects is created.
I could have counted the lines of the xml and in depency to that created a typed array with a fix size but for about 50 elements it doesn’t really matter.

List<Country> lcountries = new List<Country>();

Now the xml file gets read into Country class:

private void ReadFromXml(List<Country> countries)
{        
    String name = String.Empty;
    double gmt = 0;
    String tmpGmt = String.Empty;
 
    String filename = @".\countries.xml";
    if (!File.Exists(filename))
      MessageBox.Show(filename + " can't be accessed.", 
                                "Error", 
                                MessageBoxButtons.OK, 
                                MessageBoxIcon.Error);
    else
    {
       XmlDocument doc = new XmlDocument();
       doc.Load();
       XmlElement root = doc.DocumentElement;
       foreach (XmlNode @daten in root.ChildNodes)
       {
           name = @daten.Attributes["NAME"].InnerText;
 
           tmpGmt = @daten.Attributes["GMT"].InnerText;
           tmpGmt = tmpGmt.Replace('.',',');
           Double.TryParse(tmpGmt, out gmt);
 
           countries.Add(new Country(name, gmt));
       } 
    }
}

The most important thing, that saves a lot of lines, is the binding between combobox and List.

this.cbCountries.DataSource = lcountries;
this.cbCountries.DisplayMember = "Name";
this.cbCountries.DataBindings.Add("Text", lcountries, "Name");

The only think we have to do know is when the selected item of cbCountries is changed:

private void cbCountries_SelectedIndexChanged(object sender, EventArgs e)
{
    Country selCountry = (Country)this.cbCountries.SelectedItem;
    _dDiff = selCountry.Gmt;
}

Every second a timer (interval 1000ms) refreshes the label that shows the current date and time.

private void timer1_Tick(object sender, EventArgs e)
{
    this.lblDate.Text = GetDate().ToShortDateString();
    this.lblTime.Text = GetDate().ToLongTimeString();
}
 
private DateTime GetDate()
{
     DateTime t = DateTime.Now.ToUniversalTime();
     t = t.AddHours(_dDiff);
     return t;
}

Totally simple, written within 15 minutes and easy to comprend everybody else.
If you can also benefit from a worldclock application you can download it here.

It also works will with Linux using mono (>= 1.0). The only thing I had to take into account was the relative path of the xml file. There might be clearer solutions but to me this is the shortest way to archive my aim.

String filename = @".\countries.xml";
 
System.OperatingSystem osInfo = System.Environment.OSVersion;
 
if (!osInfo.Platform.ToString().Contains("Win"))
     filename = @"./countries.xml";

For further information you can have a look at Nameing a File (Windows) on msdn.

post String vs. StringBuilder / StringBuffer

February 26th, 2008

Filed under: .NET, Java — Kai @ 6:22 pm

Some time ago I learned when to use StringBuilder instead of String. Now I want to know the differences and advantages exactly.

As a C++ developer when working with .NET I soon notices the difference about the data types in .NET, especially the standard string object.
A standard .NET string object is immutable.

According to that this is pretty inefficient:

String s;
for (.;.;.) s += "something";

For each +=, the String’s content needs to be copied into a new instance that has a bit more space at the end for “something”.

A better solution is StringBuilder which is used to store character strings that will be changed (String objects cannot be changed). It automatically expands as needed.

The significant performance difference between these two classes is that StringBuffer is faster than String when performing simple concatenations.

Also in Java String objects are constant strings. Once they are initialized and populated, the value and memory allocation is set. If the value changes in any way, a new object is created for the new value.

But there’s a difference in Java:
StringBuilder was added in Java 5. Instances of StringBuilder are not safe for use by multiple threads. If such synchronization is required then it is recommended that StringBuffer be used.
It is identical in all respects to StringBuffer except that it is not synchronized, which means that if multiple threads are accessing it at the same time, there could be trouble. For single-threaded programs, the most common case, avoiding the overhead of synchronization makes the StringBuilder very slightly faster.

StringBuffer should be used for multi-threaded applications.
But you have to be careful the StringBuffer class is only available in Java not in .NET languages, here it’s just StringBuilder for single- and multi-threaded applications.

The usage of a .NET StringBuilder is as simple as expected:

System.Text.StringBuilder sb = new System.Text.StringBuilder();
for(int i=0; i<1000; i++)
{
    sb.Append("something");
    sb.Append(12);
    sb.Append(DateTime.Today);
    sb.AppendFormat("{0}: {1} ", i, "times passed the loop");
}
 
String out = sb.ToString();

Append() is overriden for lots of types.

Using Java it’s about the same:

String sum = new StringBuffer().append(40.00).append(" Dollar").toString();

Every time you add two strings using + operator, a new string allocation is reserverd in memory and used for this new string. So let’s say, you are adding strings 100 times using + operator, there will be 100 new memory allocations. In case of StringBuilder, there will only be one memory allocation.

StringBuilder provies a lenght() and a capaitiy() method:

When creating a new StringBuilder

StringBuilder sb = new StringBuilder();

the length is is 0 but the capacity is 16.

When creating a StringBuilder with a String in constructor

StringBuilder sb = new StringBuilder("something");

now length is the length of “something” (8) and the capacity is 16 + 8

If you expect that your buffer will need a capatity of 1000 you can initalize the StringBuilder with a buffersize:

StringBuffer sb = new StringBuffer(1000);
      System.out.println(sb.length());         // prints 0
      System.out.println(sb.capacity());       // prints: 1000

The length() always returns the number of chars in the builder, the number returned by capacity() is the same or bigger than length and tells you how many chars are left in the buffer. If no buffer is left the Builder has to reallocated new memory space, which takes time.
The StringBuffer should be initalized with the expected buffersize to avoid time-consuming reallocation.

You’d always be aware of the fact that the first time creating a StringBuilder costs time - if you just add three String together it will definetely be slower to use a Builder than concatenating them the common way.

Finally I got a visualized comparison of both:

compare SB and String

The numbers at the x-axis represent the quantity of iterates.

The blue line is the performance of the pure String approach. The red line is StringBuilder at its basic settings.
The green line represents a StringBuilder initialized to the size of the final string.

StringBuilder performance can sometimes be a bit tricky because there’s kinda “break-even-point” you always keep in mind.

A simple rule of tumb is:
If you have to concatenate a string more than 10 times, it’s better to use StringBuilder.

ruldrurd
« Previous PageNext Page »
Powered by WordPress, Content and Design by Kai Bellmann
Entries (RSS) and Comments (RSS)