How to generate random characters or strings?

How to specify the number of times a character can repeat?

Ex: I have ABC , where C should repeat 4 times in a new 6-character string:

( AB CC CC) or (AC CB CC ) or (AC CC CB )

can only enter position 1, 3, 5.
B can only enter position 2, 4, 6.
C can enter any position.

  • AA or BB is not allowed to appear, only AB.

Example not allowed:

( AA-CC-CC - ) or ( CC-AA-CC - ) or ( : CC CC AA - ) or
( the BB-CC-CC - ) or ( CC-BB-CC - ) or ( CC-CC-BB - ).

I expect all these combinations:

AC AC CC
AC CC CB
AC CB CC
AC CC AC
AB CC CC
CB AC CC
CB CC AC
CB CC CB
CB CB CC
CC AC AC
CC AC CB
CC CB AC
CC CB CB
CC CC AB
CC AB CC

(no need is in that order, but must have all these outputs)

Author: Diego Well, 2017-10-24

4 answers

I will attack the specific case of your question. Among so many possible ways to generate random information chains, I found these to be the most significant. But first, a little concept.

Is all information

First of all, I would like to point out that letters, numbers, and what else is being treated here is nothing but information that is being generated to be consumed by another process.

In the case of strings( or words), the the information that makes them up is:

  1. your characters
  2. the position of each of its characters

So for each letter, two pieces of information are required. We can then represent strings of letters as a set of character/position tuples . For example, the word banana:

{
  ('b',0),
  ('a',1),
  ('n',2),
  ('a',3),
  ('n',4),
  ('a',5)
}

Since we have the guarantee that the second term of the tuple will always be unique (if not, then we would assume that a word could have two or more letters in the same position, which is not the case), so each tuple formed is unique and the cardinality of the set that generates a given word is equal to the length of the word.

Each letter a different information? Maybe not

We have seen that a word is nothing more than a set of letter/position tuples. For your specific case, we have 3 possible letters for 6 positions. Letter are:

  • A
  • B
  • C

But they obey certain restrictions. For example, in positions 0, 2 and 4 we can have A or C, never B. For positions 1, 3 and 5 we can have B or C. This implies that for an informed position i, the letter can be C or não-C. No more.

Why this? Because the maximum possible amount of distinct values in each position is 2: can be {A,C}, or it can be {B,C}. Therefore, we can represent with only 1 bit of information in addition to the positional information what is the character in the specific position. As C is common for both groups of options, either if it is C or if it is distinct from C (hence não-C).

Therefore, for your case in question, it is enough that we generate a sequence of 0s and 1s (with 0 representing não-C and 1, C), applied to the constraint of owning exactly 4 times bit 1 (the C needs appear exactly 4 times).

But we can still compress more!

See what is not Random? (video in English) for more details.

In your expected output list, the first word is ACACCC. In the tuple set representation letter / position:

{
  ('A', 0),
  ('C', 1),
  ('A', 2),
  ('C', 3),
  ('C', 4),
  ('C', 5)
}

As already discussed, the letter being A or B does not carry information, so it is either C or não-C, 1 or 0:

{
  ('0', 0),
  ('1', 1),
  ('0', 2),
  ('1', 3),
  ('1', 4),
  ('1', 5)
}

Noted how are the 1s in the majority? Well, the differential between this word and this other also valid word is just an exchange of positions between letters:

{
  ('1', 0),
  ('0', 1),
  ('0', 2),
  ('1', 3),
  ('1', 4),
  ('1', 5)
}

What does that mean then? That I would only need a single piece of information to differentiate between ACACCC and CBACCC. But our representational model is not minimal enough. The only real information exchanged was the position of the letter / position tuple position ('0', 0) to ('0', 1). How can that be? Because all positions that does not have 0 necessarily has 1, so I need to inform only the positions that contains 0. As I already know which character to be informed, it ceased to be information, being despicable. Therefore, the words ACACCC and CBACCC are represented by, respectively:

{ 0, 2 }
{ 1, 2 }

TL; DR

The representation of the string is summarized only in the description of the position of the two different letters of C that are in the word. All other 4 positions are then C. If the index is even, the letter não-C is A, otherwise it is B. Note that the two positions need to be distinct, otherwise we will be offering a word with five Cs, which is not what we want.

Generating a random word with data

Knowing how to represent information, generating the answer now consists of generating two distinct integers between 0 and 5 inclusive. An alternative would be to play data:

two 6-sided data with distinct results, in this case the result of the scroll was 2 and 4

In this case, the string obtained would be 101011, which equals CBCBCC. If the result were two equal numbers, it would be enough to throw the dice again: of the 21 possible results, 15 are of different numbers, and each of these 15 has twice the chance of happening before a result of two equal numbers.

However, it would not be very practical for your program to throw data to generate the string... or would it be?

To create a " data", we need a random number generator. In this case, the generator is an object of class Random. So just generate the random integer in the range [0, 5], which is done using Next(6) (Next(m) generates an integer in the range [0, m) open in m; since [0, 5] is equivalent in integers to [0, 6), the number passed was 6). To respect the randomness predicted by the random number generator, always the same generator needs to be called again and again to get numbers "reasonably" random.

So, your algorithm consists of finding two distinct random integers in the range [0, 6). So we can summarize it like this:

// cria o gerador de números aleatórios
// você também pode prover esse gerador, no lugar de ficar criando um novo a cada instante
// reutilizar o mesmo gerador de números aleatórios nos garante a entropia máxima possível entre duas gerações de números 
Random r = new Random();

int a, b;

// gira os dados...
a = r.Next(6);
b = r.Next(6);

// insiste na rolagem de um dado até obter um resultado distinto
while (a == b) {
  b = r.Next(6);
}

Okay, now we have two random integers between [0, 6). This is all the information needed to get the string.

string cadeia = "";
for (int i = 0; i < 6; i++) {
  if (i == a || i== b) {
    // caso seja um dos números aleatórios, então é não-C
    if (i % 2 == 0) {
      // posições pares não-C são dedicadas a A
      cadeia += "A";
    } else {
      // caso ímpar, B
      cadeia += "B";
    }
  } else {
    // caso contrário, preencha com C
    cadeia += "C";
  }
}

Then, after performing this procedure, we have a string in the desired format.

Frequency of words

To know how often each word will appear, we have the chance to rotate thousands of times and take the average or try a more theoretical approach. The theory seems to be more fun.

The strategy used for random number generation by the class Random is a subtractive generator (source ). According to the source of Rosetta Code, the developers of Freeciv imply that the calculation using this subtractive strategy and a module generates low entropy in the least significant bits .

So, the expected entropy for our case is not the greatest. Then it would be necessary to rotate several times until you get the 15 different words. Since the entropy is possibly low, one can expect to rotate in a few words until one gets a new one.

Wandering walk

Another alternative is to use a "wandering walk" to assemble the desired word. We what is that?

Well, the most correct term would be random ride, but wandering walk sounds more natural. Imagine that you have the following grid:

Grid 4x2 with C on the horizontal axis and non-C on the vertical axis, with the pairs of integers properly marked and with the most extreme point circulated

The best explanation I think there is for wandering walks is a video from the Infinite Series: what is a Random Walk?.

In the particular case, I will take a 4x2 grid (4 horizontally and 2 vertically) for our wandering walk. So all I have to do is define the probabilities of taking the steps. Oh, it also goes without saying what these steps are. I go back to the meaning of the steps later, I will finish the rest of the modeling of the walk before entering the meanings, but I will soon advance that I dedicate an entire section to explain this.

Since my grid is finite, if the Walker is on the edge, he can no longer follow in that direction. For example, starting from the origin, if in the first two steps the Walker chose to climb and climb, the third step will necessarily be to the right or down, as he can not climb nor move to the left, as it would fall off the grid.

I'm also not interested in back-tracking, so the Walker will always go towards the final destination, which is the position (4,2). It cannot therefore move either down or left, as both would be a back-track of the path, getting closer to the origin than of fate after the step.

Note that on my floor, I am not using the "drunk floor" as was the example Kelsey used in the video. My movement options are more limited so I can better represent the problem I am attacking.

So, I already know that moves to the left or down always have a possibility of 0. I also know that after any two steps up, the third step in that same direction also has chance 0, and analog for 4 steps to the right, with the fifth step with chance 0. But what about the other steps?

I would like each vertical transition from the same level (like (x, 0) to (x, 1)) to have the same chance global to occur. I am not directly interested in the chance of being in each state, so I can manipulate these values at will.

To exit the first level and enter the second, I have 5 possibilities:

  • (0, 0) ==> (0, 1)
  • (1, 0) ==> (1, 1)
  • (2, 0) ==> (2, 1)
  • (3, 0) ==> (3, 1)
  • (4, 0) ==> (4, 1)

So in all, each transaction needs to be fired 20% of the time.

Starting from the origin, before the first step, I have a 100% chance that the Walker is in that particular position. So I need a 20% chance for the vertical transaction.

For point (1, 0), I now have an 80% (4/5) chance of hiker be at that point. So, I need that, of these 80%, 25% (1/4) shoot the vertical transaction (getting a global total of 80% x 25% = 20%). So, being at the point (1, 0), we have:

  • 1/4 chance to rise
  • 3/4 possibility to go right

Similarly, only 3/5 chance of the Walker will reach the point (2, 0), so the vertical transaction must be fired 1/3 of the time. In position (3, 0), it will be 2/5 of chance to be there, so half the shots should be done up. So in the end, we have a 1/5 chance of the Walker being at (4, 0), so we need 100% of the shots up. As with any luck this point is at the end of the grid, it is coerced to go up even.

The general form in this case is:

  • chance of vertical firing at position (i, 0): 1/(5 - i)
  • chance of horizontal firing at position (i, 0): 1 - 1/(5 - i) = (4 - i)/(5 - i)

For the case of the second shot... well, I have no encouraging news regarding balancing. Let's take the position (4, 1). It will always fire upwards, so the overall chance of this transaction being fired is equal to the overall chance of the Walker being in position (4, 1) after 5 steps. Computing only the chance to exit the state (4, 0), we have a 20% chance. So we have the 20% + a value S*X of the Walker being in position (3, 1) (probability of S) and if move horizontally (trigger probability X). Since the expected resulting value of the output is 20%, and we also know that 20% = 20% + S*X, then we have to S*X = 0. Since there is a non-zero chance that the Walker will pass through this position, then X must be zero.

That is, if I want the probability of global firing of vertical transactions to reach the third level, it is necessary to override the possibility of horizontal firing at the second level. This result can be achieved at from an extrapolation of what was obtained with X in the previous paragraph. But this is not at all desired. Maybe we should reverse the horizontal and vertical firing probabilities of the previous level's shots?

So, following this methodology, this is my firing table:

Ponto  V   H
-------------
(0,0) 1/5 4/5
(1,0) 1/4 3/4
(2,0) 1/3 2/3
(3,0) 1/2 1/2
(4,0)  1   0
(0,1) 4/5 1/5
(1,1) 3/4 1/4
(2,1) 2/3 1/3
(3,1) 1/2 1/2
(4,1)  1   0
(0,2)  0   1
(1,2)  0   1
(2,2)  0   1
(3,2)  0   1
(4,2) DESTINO

In this table, column Ponto indicates the probability of firing from that point, column V the probability of vertical firing and H of firing vertical.

Walking erratically

For this walk, just throw a probability and counter with the table above. To generate a number worthy of this probability, just call the method NextDouble() of an object of class Random, if it is less than the value of the table for the vertical transaction, then it fires vertically. Otherwise, shoot horizontally.

For example:

  • if the value 0.2 is obtained, exiting the source, then the transaction followed will be to the right
  • if 0.49 is obtained from position (3,0), then it will go up

With this, in a 6-step loop, we get a path that leaves the origin and arrives at the destination (4,2).

Path example:

an example of a shooting sequence, right-up-right-right-up-right

Semantics of the wandering path

I promised I would explain this after I finished modeling, didn't I?

Let's go back to the first picture:

the first grid image again

Did you notice her axes? The horizontal axis is C, while the vertical is C with the negation bar on top, therefore não-C. This means that a step taken to the vertical consists in the creation / consumption of a letter of type não-C, while in the horizontal it would be the consumption of a C proper. Going back to the binary representation, a vertical step equals 0, while a horizontal one equals 1.

So what would be the sequence generated by the example above?

recap the drawing

Would be 101101, or CBCCAC turning the binary information into the final string.

What if you wanted to get the word whose binary is 101011?

path equivalent to < code>CBCBCC

That would give the word CBCBCC.

With these paths I can also represent two consecutive não-Cs, as in ABCCCC. The equivalent binary is 001111, so the path design would be like this:

path equivalent to < code>ABCCCC

Curiosities of this modeling

As explained in the video about wandering walks , this processing is nothing less than a Markov chain. And Chomsky in one of his seminal papers on formal languages three models for the description of language described how a simple finite Markov process manages to generate a language of infinite words (Section 2). By the way, soon we first paragraphs of this section, Chomsky describes how to create Markov Processes for the Generation of words.

In another Article (on certain Formal Properties of Grammars), Chomsky demonstrates in Theorem 6 the definition 9, the definition implies that finite Markov Processes generate a regular language by issuing characters when triggering a transaction.

Note on reading on Certain Formal Properties of Grammars]14: o which Chomsky defined in this article as regular grammar today we know as Chomsky normal form for context-free grammars

Now, aren't we firing characters when making transactions? Is this character 0 or 1? So we have here a Markov process, with finite amount of states, that fires characters. So, without even realizing it, we end up creating the deterministic finite automaton that recognizes the words of your language. Note that at the level of word recognition, the probability of triggering the Markov process can be summarily ignored. The fact that it is represented by a regular language means that to recognize the desired words it also has an underlying regular expression.

Seems like too much of a complication of this wandering walk thing, doesn't it?

Appearances can deceive. In the case, the available space to be wandering it has more possibility for variations in what you are wanting to recognize/generate than the previous solution.

For example, if you need to recognize with at least one não-C and at most two não-C, with 6 letters in all? Then we would have added the following points to the grid:

  • (5,0)
  • (5,1)

But it makes no sense to add (5,2), as it would take 7 letters (limit is 6 letters).

If appropriate repetition of this structure? For example, after finding a sequence like this, you can have one, two, or infinite sequences that follow the same formation logic. So we're dealing with Kleene star about this language of As, Bs and Cs, limited to 6 characters in all etc.

Not to mention, graphically it gets prettier.

So why not just present this answer option? Because this, although more complete and widespread, is not the most simple. So I decided to leave both options. And also start talking about the simplest to introduce the concepts to be used in this most complete part sounds more natural to the reader.

Enumerating the strings

I presented two generation abstractions. You can use both to generate the enumeration of all possible words. But I find it particularly easier the alternative of the two data.

Since it's just two 6-sided dice, I could simply iterate through all 36 possibilities and, in each of them, check whether it is a valid release or not. Or else I can be smarter and only generate valid possibilities.

To be valid, the second number needs to happen after the first and necessarily be greater. Why this difference now, since in the algorithm presented above did so much? Because (3,1) is equivalent to (1,3) and assuming this limitation becomes easier program.

Recap, I can encapsulate the generation routine described in the previous section into a function:

public static string geraPalavra(int a, int b) {
    string cadeia = "";
    for (int i = 0; i < 6; i++) {
      if (i == a || i== b) {
        // caso seja um dos números aleatórios, então é não-C
        if (i % 2 == 0) {
          // posições pares não-C são dedicadas a A
          cadeia += "A";
        } else {
          // caso ímpar, B
          cadeia += "B";
        }
      } else {
        // caso contrário, preencha com C
        cadeia += "C";
      }
    }
    return cadeia;
}

And then it would be enough to call this function with the properly filled arguments to enumerate all the possible strings;

for (int a = 0; a < 5; a++) {
    for (int b = a + 1; b < 6; b++) {
        string novaPalavra = geraPalavra(a, b);
        // faz algo com essa string
    }
}
 5
Author: Jefferson Quesado, 2018-03-06 02:24:20

I think the morning for this question is to create a vector with the lettering and be drawing them inside the vector and checking the previous character and if the C has appeared four times, see if the code below meets you:

//vetor de char com as letras
            var letras = new char[3];
            letras[0] = 'A';
            letras[1] = 'B';
            letras[2] = 'C';
            var resposta = "";
            int numC = 0;
        while (resposta.Length < 6)
        {
            //gerando um aleatório dentro do intervalo do vetor
            var random = new Random();
            var i = random.Next(letras.Length);

            //caso a resposta tenha tamanho 0 qualquer letrar serve
            if (resposta.Length == 0)
            {
                resposta += letras[i];
                continue;
            }

            if (letras[i] == 'C')
                numC++;

            //o C sempre pode ser usado sempre até quatro vezes, mas o A só pode quando o anterior for B e vice versa
            if ((letras[i] == 'C' && numC < 4) || resposta[resposta.Length - 1] != letras[i])
                resposta += letras[i];
        }

        Console.Write(resposta); 
 0
Author: Leandro Siqueira Dias, 2017-10-25 11:41:13

Try This Solution:

private static string Seleciona() {
    var origem = new List<char>() { 'A', 'B', 'C', 'C', 'C', 'C' };
    var resultado = new StringBuilder();
    int indice;
    var rnd = new Random(GerarSeed());
    for (int i = (origem.Count - 1); i >= 0; i--) {
        indice = rnd.Next(0, i);
        resultado.Append(origem[indice]);
        origem.RemoveAt(indice);
    }
    return resultado.ToString();

    int GerarSeed() {
        using (var rgn = new RNGCryptoServiceProvider()) {
            var rno = new byte[20];
            rgn.GetBytes(rno);
            return BitConverter.ToInt32(rno, 0);
        }
    }
}

A vector containing all characters is already predefined. The job is to randomly select the elements, and with each selection, the element is removed from the vector.

To test the Code:

public static void Main(string[] args) {
    for (int i = 0; i < 100; i++) {
        Console.WriteLine(Seleciona());
    }
}
 0
Author: Armindo Gomes, 2017-10-31 15:37:47

This should solve

public static Random random = new Random();

public static string Sortear()
{
    // posição 0, 2, 4
    int a = random.Next(6);

    //posição 1, 3, 5
    int b;
    do 
    {
        b = random.Next(6);
    } while(b == a);

    //string res = "a: "+a+" b: "+b +" -> ";
    string res = "";
    for (int i = 0; i < 6; i++)
    {
        if (i == a && (a % 2 == 0))
        {
            res += "A";
        }
        else if (i == a && (a % 2 == 1))
        {
            res += "B";
        }
        else if (i == b && (b % 2 == 0))
        {
            res += "A";
        }
        else if (i == b && (b % 2 == 1))
        {
            res += "B";
        }
        else 
        {
            res += "C";
        }
    }
    return res;
}

If you want it to display all you just need to change the random by one for

public static string MostrarTodas()
{
    string res = "";
    for (int a = 0; a < 6; a++)
    {
        for (int b = a + 1; b < 6; b++)
        {
            res += "\n";
            for (int i = 0; i < 6; i++)
            {
                if (i == a && (a % 2 == 0))
                {
                    res += "A";
                }
                else if (i == a && (a % 2 == 1))
                {
                    res += "B";
                }
                else if (i == b && (b % 2 == 0))
                {
                    res += "A";
                }
                else if (i == b && (b % 2 == 1))
                {
                    res += "B";
                }
                else 
                {
                    res += "C";
                }
            }
        }
    }
    return res;
}
 0
Author: Christian Beregula, 2018-01-18 10:58:24